Corona Virus

Coronaviruses are zoonotic viruses (means transmitted between animals and people).
Symptoms include from fever, cough, respiratory symptoms, and breathing difficulties.
In severe cases, it can cause pneumonia, severe acute respiratory syndrome (SARS), kidney failure and even death.
Coronaviruses are also asymptomatic, means a person can be a carrier for the infection but experiences no symptoms.

Novel coronavirus (nCoV)

A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans.

COVID-19 (Corona Virus Disease 2019)

Caused by a SARS-COV-2 corona virus.
First identified in Wuhan, Hubei, China. Earliest reported symptoms reported in November 2019.
First cases were linked to contact with the Huanan Seafood Wholesale Market, which sold live animals.
On 30 January the WHO declared the outbreak to be a Public Health Emergency of International Concern

Importing Necessary Libraries:
In [1]:
# import the necessary libraries
import numpy as np 
import pandas as pd 
import os
from datetime import datetime, timedelta
In [2]:
# Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
import pycountry
import plotly.offline as py
import plotly.express as px
from ipywidgets import widgets
from IPython.display import display
!jupyter nbextension enable --py --sys-prefix widgetsnbextension

py.init_notebook_mode(connected=True)
import folium 
from folium import plugins
plt.style.use("fivethirtyeight")# for pretty graphs
from plotly.offline import init_notebook_mode, iplot 
import plotly.graph_objs as go

from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok
In [3]:
# Increase the default plot size and set the color scheme
plt.rcParams['figure.figsize'] = 8, 5
# Disable warnings 
import warnings
warnings.filterwarnings('ignore')
In [4]:
#!pip install pywaffle
from pywaffle import Waffle
In [5]:
from ipywidgets import widgets
from IPython.display import display
!jupyter nbextension enable --py --sys-prefix widgetsnbextension

py.init_notebook_mode(connected=True)
import folium 
from folium import plugins
plt.style.use("fivethirtyeight")# for pretty graphs
Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok
In [10]:
plt.rcParams['image.cmap'] = 'viridis'
Loading Data:
In [6]:
df_cov = pd.read_csv('covid_19_data.csv')
df_cnf = pd.read_csv('time_series_covid_19_confirmed.csv')
df_rec = pd.read_csv('time_series_covid_19_recovered.csv')
df_death = pd.read_csv('time_series_covid_19_deaths.csv')
df_cov.drop(columns=['SNo'],inplace=True)

df_cov['ObservationDate'] = pd.to_datetime(df_cov['ObservationDate'] )
df_cov = df_cov.set_index('ObservationDate')
Visualizing Spread of Covid-19 across the globe:
In [7]:
df_wrld = df_cov.loc[:,['Confirmed','Deaths','Recovered']]
df_wrld = df_wrld.groupby(['ObservationDate']).sum()
print(df_wrld.head())
# df_wrld.head()
cnf_data = go.Scatter(x=df_wrld.index,
                         y=df_wrld.Confirmed, name = "Confirmed")
dea_data = go.Scatter(x=df_wrld.index,
                         y=df_wrld.Deaths,
                     yaxis='y2', name = "Deaths")
rec_data = go.Scatter(x=df_wrld.index,
                         y=df_wrld.Recovered,
                     yaxis='y3', name = "Recovered")

layout = go.Layout(title='COVID-19 progression', xaxis=dict(title='Date'),
                   yaxis=dict(color='blue'),
                  yaxis2=dict(color='red',
                               overlaying='y', side='right'),
                  yaxis3=dict(color='green',
                               overlaying='y', side='left'),
                  template="plotly_dark")

fig = go.Figure(data=[cnf_data,dea_data,rec_data], layout=layout)
fig.show()
                 Confirmed  Deaths  Recovered
ObservationDate                              
2020-01-22           555.0    17.0       28.0
2020-01-23           653.0    18.0       30.0
2020-01-24           941.0    26.0       36.0
2020-01-25          1438.0    42.0       39.0
2020-01-26          2118.0    56.0       52.0
The vrius spread gained momentum in first week of February and there by increased exponentially there after.
By March 25, 2020:
     Confirmed Cases: 467.594k
              Deaths:  21.181k
               Cured: 113.770k
How did the virus propagate from Mainland China to Rest of the World?
In [8]:
cnf_period = df_cnf.drop(columns=['Province/State','Country/Region','Lat','Long']).columns
death_period = df_death.drop(columns=['Province/State','Country/Region','Lat','Long']).columns
rec_period = df_rec.drop(columns=['Province/State','Country/Region','Lat','Long']).columns

df_cnf1 = df_cnf.melt(id_vars=['Province/State','Country/Region','Lat','Long'],value_vars=cnf_period,var_name='Date',value_name='count')
df_death1 = df_death.melt(id_vars=['Province/State','Country/Region','Lat','Long'],value_vars=death_period,var_name='Date',value_name='count')
df_rec1 = df_rec.melt(id_vars=['Province/State','Country/Region','Lat','Long'],value_vars=rec_period,var_name='Date',value_name='count')





df_cnf1.dropna(subset=['count', 'Country/Region'],inplace=True) 
df_death1.dropna(subset=['count', 'Country/Region'],inplace=True) 
df_rec1.dropna(subset=['count', 'Country/Region'],inplace=True) 
In [9]:
fig = px.scatter_geo(df_cnf1, lat='Lat',lon='Long',color='Country/Region',
                     hover_name="Country/Region", size='count',
                     animation_frame="Date",
                     projection="natural earth",
                    title='Patient Confirm Progression ',template="plotly_dark")
# fig['data'][0].update(mode='markers+text', textposition='bottom center',
#                       text=df_cnf['Country/Region'].map('{}'.format).astype(str)+' '+\
#                       str(df_cnf['3/20/20']))


#     time.sleep(1)
    
fig.show()

The Pandemic that Started from China expanded its wings in East Asia by end of January, there after virus slowly propagated to other countries. By start of March, West European Countries especially Italy and Spain were bogged down by it sudden attack. Even gulf countries couldn't stay untouched to Covid 19. Iran was the most affected countries. By end of March, it can be observed that the virus is wrecking havoc in most of the world, majorly in US, Italy, Iran and Spain.

Analysing the pattern in Deaths...

In [11]:
fig = px.scatter_geo(df_death1, lat='Lat',lon='Long',color='Country/Region',
                     hover_name="Country/Region", size='count',
                     animation_frame="Date",
                     projection="natural earth",
                    title='Patient Death Progression ',template="plotly_dark")
#fig['data'][0].update(mode='markers+text', textposition='bottom center',
                       #text=df_cnf['Country/Region'].map('{}'.format).astype(str)+' '+\
                       #str(df_cnf['3/20/20']))

#time.sleep(1)
    
fig.show()

Until Mid February, hardly any deaths were reported outside the Mainland: China. By the start of March, death toll begun to rise in East Asia, Gulf Countries, West Europe(Italy and Spain) and US. It can seen that, by end of March, the virus had caught hold of almost entire world: USA, West Europe had turned into death hotspots. In this phase, few deaths were also reported from South American and African Continents and some South Asian Countries(India and Pakistan) also fell prey to it.

Nations' competency to tackle Corona..
Analysing Countrywise Recovery Status of Corona Patients:
In [12]:
fig = px.scatter_geo(df_rec1, lat='Lat',lon='Long',color='Country/Region',
                     hover_name="Country/Region", size='count',
                     animation_frame="Date",
                     projection="natural earth",
                    title='Patient Recovered progression ',template="plotly_dark")
# fig['data'][0].update(mode='markers+text', textposition='bottom center',
#                       text=df_cnf['Country/Region'].map('{}'.format).astype(str)+' '+\
#                       str(df_cnf['3/20/20']))


#     time.sleep(1)
    
fig.show()

In Early February, China began to cope up with the situation. Positive outcomes(Patient Recovery Rates) began to surge in China. Following China's steps to curb Death rate, soon precautionary measures, lockdowns were implemented in other parts of the world. It faciliated virus containment and fostered the recovery rates. By End of March, China managed to successfully treat its patients and around 74k patients recovered from virus attack. Positive outcomes were witnessed from Rest of the World too. Italy counted 11k recoveries, Iran and Spain accounted 11k and 9.5k recoveries respectively. The superpower USA however seemed inefficient to cope-up with the Pandemic, as the crisis there continued.

Analysing Impact of Weather Conditions on COVID-19:

In [13]:
#Loading Clean Dataset
cleaned_data = pd.read_csv('covid_19_clean_complete.csv', parse_dates=['Date'])

cleaned_data.rename(columns={'ObservationDate': 'date', 
                     'Province/State':'state',
                     'Country/Region':'country',
                     'Last Update':'last_updated',
                     'Confirmed': 'confirmed',
                     'Deaths':'deaths',
                     'Recovered':'recovered'
                    }, inplace=True)

# cases 
cases = ['confirmed', 'deaths', 'recovered', 'active']

# Active Case = confirmed - deaths - recovered
cleaned_data['active'] = cleaned_data['confirmed'] - cleaned_data['deaths'] - cleaned_data['recovered']

# replacing Mainland china with just China
cleaned_data['country'] = cleaned_data['country'].replace('Mainland China', 'China')

# filling missing values 
cleaned_data[['state']] = cleaned_data[['state']].fillna('')
cleaned_data[cases] = cleaned_data[cases].fillna(0)
cleaned_data.rename(columns={'Date':'date'}, inplace=True)

data = cleaned_data

display(data.head())
display(data.info())
state country Lat Long date confirmed deaths recovered active
0 Afghanistan 33.0000 65.0000 2020-01-22 0 0 0 0
1 Albania 41.1533 20.1683 2020-01-22 0 0 0 0
2 Algeria 28.0339 1.6596 2020-01-22 0 0 0 0
3 Andorra 42.5063 1.5218 2020-01-22 0 0 0 0
4 Angola -11.2027 17.8739 2020-01-22 0 0 0 0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16884 entries, 0 to 16883
Data columns (total 9 columns):
state        16884 non-null object
country      16884 non-null object
Lat          16884 non-null float64
Long         16884 non-null float64
date         16884 non-null datetime64[ns]
confirmed    16884 non-null int64
deaths       16884 non-null int64
recovered    16884 non-null int64
active       16884 non-null int64
dtypes: datetime64[ns](1), float64(2), int64(4), object(2)
memory usage: 1.2+ MB
None
In [14]:
# Check if the data is updated
print("External Data")
print(f"Earliest Entry: {data['date'].min()}")
print(f"Last Entry:     {data['date'].max()}")
print(f"Total Days:     {data['date'].max() - data['date'].min()}")
External Data
Earliest Entry: 2020-01-22 00:00:00
Last Entry:     2020-03-28 00:00:00
Total Days:     66 days 00:00:00
In [15]:
def p2f(x):
    """
    Convert urban percentage to float
    """
    try:
        return float(x.strip('%'))/100
    except:
        return np.nan

def age2int(x):
    """
    Convert Age to integer
    """
    try:
        return int(x)
    except:
        return np.nan

def fert2float(x):
    """
    Convert Fertility Rate to float
    """
    try:
        return float(x)
    except:
        return np.nan


countries_df = pd.read_csv("population_by_country_2020.csv", converters={'Urban Pop %':p2f, 'Fert. Rate':fert2float,
                                                                        'Med. Age':age2int})
countries_df.rename(columns={'Country (or dependency)': 'country',
                             'Population (2020)' : 'population',
                             'Density (P/Km²)' : 'density',
                             'Fert. Rate' : 'fertility',
                             'Med. Age' : "age",
                             'Urban Pop %' : 'urban_percentage'}, inplace=True)



countries_df['country'] = countries_df['country'].replace('United States', 'US')
countries_df = countries_df[["country", "population", "density", "fertility", "age", "urban_percentage"]]

countries_df.head()
Out[15]:
country population density fertility age urban_percentage
0 China 1439323776 153 1.7 38.0 0.61
1 India 1380004385 464 2.2 28.0 0.35
2 US 331002651 36 1.8 38.0 0.83
3 Indonesia 273523615 151 2.3 30.0 0.56
4 Pakistan 220892340 287 3.6 23.0 0.35
In [16]:
data = cleaned_data
In [17]:
#Merging Covid_19 Data and Countries data
data = pd.merge(data, countries_df, on='country')
In [18]:
#Loading Temperature Data and performing some Preprocessing
df_temperature = pd.read_csv("temperature_dataframe.csv")
df_temperature['country'] = df_temperature['country'].replace('USA', 'US')
df_temperature['country'] = df_temperature['country'].replace('UK', 'United Kingdom')
df_temperature = df_temperature[["country", "province", "date", "humidity", "sunHour", "tempC", "windspeedKmph"]].reset_index()
df_temperature.rename(columns={'province': 'state'}, inplace=True)
df_temperature["date"] = pd.to_datetime(df_temperature['date'])
df_temperature['state'] = df_temperature['state'].fillna('')
df_temperature.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16677 entries, 0 to 16676
Data columns (total 8 columns):
index            16677 non-null int64
country          16677 non-null object
state            16677 non-null object
date             16677 non-null datetime64[ns]
humidity         16500 non-null float64
sunHour          16500 non-null float64
tempC            16500 non-null float64
windspeedKmph    16500 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1), object(2)
memory usage: 1.0+ MB
In [19]:
# Merging temperature data on Covid19 Data
data = data.merge(df_temperature, on=['country','date', 'state'], how='inner')
data['mortality_rate'] = data['deaths'] / data['confirmed']
data.head()
Out[19]:
state country Lat Long date confirmed deaths recovered active population density fertility age urban_percentage index humidity sunHour tempC windspeedKmph mortality_rate
0 Afghanistan 33.0 65.0 2020-01-22 0 0 0 0 38928346 60 4.6 18.0 0.25 0 65.0 8.7 -1.0 8.0 NaN
1 Afghanistan 33.0 65.0 2020-01-23 0 0 0 0 38928346 60 4.6 18.0 0.25 1 59.0 8.7 -3.0 8.0 NaN
2 Afghanistan 33.0 65.0 2020-01-24 0 0 0 0 38928346 60 4.6 18.0 0.25 2 71.0 7.1 0.0 7.0 NaN
3 Afghanistan 33.0 65.0 2020-01-25 0 0 0 0 38928346 60 4.6 18.0 0.25 3 79.0 8.7 0.0 7.0 NaN
4 Afghanistan 33.0 65.0 2020-01-26 0 0 0 0 38928346 60 4.6 18.0 0.25 4 64.0 8.7 -1.0 8.0 NaN
In [20]:
data.describe()
Out[20]:
Lat Long confirmed deaths recovered active population density fertility age urban_percentage index humidity sunHour tempC windspeedKmph mortality_rate
count 11638.000000 11638.000000 11638.000000 11638.000000 11638.000000 11638.000000 1.163800e+04 11638.000000 11458.000000 11458.000000 11398.000000 11638.000000 11520.000000 11520.000000 11520.000000 11520.000000 5293.000000
mean 22.877814 30.081798 380.531449 13.829868 151.797646 214.903935 2.782792e+08 334.411583 2.238776 34.298831 0.653100 6633.882454 63.516146 8.646493 16.397049 12.361285 0.013979
std 24.821627 70.595019 3988.533078 178.721725 2162.934077 2277.174401 5.360348e+08 1978.759922 1.020779 7.975900 0.187607 4196.988880 20.604686 2.599630 11.509561 7.581626 0.065837
min -41.454500 -123.120700 0.000000 0.000000 0.000000 0.000000 3.812800e+04 2.000000 1.200000 17.000000 0.150000 0.000000 5.000000 1.500000 -21.000000 1.000000 0.000000
25% 8.619500 -8.224500 0.000000 0.000000 0.000000 0.000000 5.106626e+06 31.000000 1.700000 29.000000 0.560000 3089.250000 49.000000 6.900000 8.000000 7.000000 0.000000
50% 27.610400 27.953400 0.000000 0.000000 0.000000 0.000000 2.549988e+07 108.000000 1.800000 38.000000 0.630000 6230.500000 69.000000 8.700000 15.000000 10.000000 0.000000
75% 42.315400 101.058300 18.000000 0.000000 1.000000 9.000000 8.378394e+07 153.000000 2.400000 41.000000 0.810000 9791.750000 79.000000 11.600000 27.000000 16.000000 0.007916
max 64.963100 174.886000 67800.000000 4825.000000 58946.000000 50633.000000 1.439324e+09 26337.000000 6.100000 48.000000 0.980000 16676.000000 99.000000 14.000000 45.000000 65.000000 1.000000
In [21]:
#Data Processing
temp_gdf = data.groupby(['date', 'country'])['tempC', 'humidity'].mean()
temp_gdf = temp_gdf.reset_index()
temp_gdf['date'] = pd.to_datetime(temp_gdf['date'])
temp_gdf['date'] = temp_gdf['date'].dt.strftime('%m/%d/%Y')

temp_gdf['tempC_pos'] = temp_gdf['tempC'] - temp_gdf['tempC'].min()  # To use it with size

wind_gdf = data.groupby(['date', 'country'])['windspeedKmph'].max()
wind_gdf = wind_gdf.reset_index()
wind_gdf['date'] = pd.to_datetime(temp_gdf['date'])
wind_gdf['date'] = wind_gdf['date'].dt.strftime('%m/%d/%Y')
In [22]:
target_gdf = data.groupby(['date', 'country'])['confirmed', 'deaths'].sum()
target_gdf = target_gdf.reset_index()
target_gdf['date'] = pd.to_datetime(target_gdf['date'])
target_gdf['date'] = target_gdf['date'].dt.strftime('%m/%d/%Y')
Visualizing Temperature Changes across the globe for last two months:
In [23]:
fig = px.scatter_geo(temp_gdf.fillna(0), locations="country", locationmode='country names', 
                     color="tempC", size='tempC_pos', hover_name="country", 
                     range_color= [-20, 45], 
                     projection="natural earth", animation_frame="date", 
                     title='Temperature by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

The Countries like Canada, Ireland, Russia experienced lowest tempeartures of around -10C to 0C The Northern Hemisphere countries hold average 20C throughout the period. The Countries near Equator experienced significant change in temperatures throughout the period. By mid March, their temperature can be taken as 30C as mean temperature. Countries in Southern Hemisphere had temperatures in range of 25C on an average.

Visualizing Humidity by Country:

The second figure is humidity by country. It seems there's no clear location-humidity relation like temperature. We can see humidity is relatively low in China, while humidity is always high in Europe region.

In [24]:
fig = px.scatter_geo(temp_gdf.fillna(0), locations="country", locationmode='country names', 
                     color="humidity", size='humidity', hover_name="country", 
                     range_color= [0, 100], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: Humidity by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()
Impact of Weather Condition on Covid-19's Spread:
In [25]:
gdf = pd.merge(target_gdf, temp_gdf, on=['date', 'country'])
gdf['confirmed_log1p'] = np.log1p(gdf['confirmed'])
gdf['deaths_log1p'] = np.log1p(gdf['deaths'])
gdf['mortality_rate'] = gdf['deaths'] / gdf['confirmed']

gdf = pd.merge(gdf, wind_gdf, on=['date', 'country'])
In [26]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="tempC", size='confirmed', hover_name="country", 
                     range_color= [-20, 45], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: Confirmed VS Temperature by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

It can be seen that Corona started in China when the temperature was cold, but its spread wasn't affected much even when the temperature increased in China. Also Corona spread in Europe started with relatively high, medium temperature (around 20C). Thus, Covid-19's high contagiosity might be the reason behind its wide spread, despite the charactteristic properties of virus exhibiting weakness against the high temperatures.

In following visualization, Circle size is now shown with log scale, to indicate how corona spread affected minor countries.

In [27]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="tempC", size='confirmed_log1p', hover_name="country", 
                     range_color= [-20, 45], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: log1p(confirmed) VS Temperature by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()
Deaths Vs. Temperaures
In [28]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="tempC", size='deaths', hover_name="country", 
                     range_color= [-20, 45], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: deaths VS temperature by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

Number of deaths witnessed was high in China, Europe, US and Iran. Even though these are north-side i.e cooler temperature regions, it might be because of high population density. In USA, most of the deaths belonged to Newyork(one of the most crowded province of USA).

Mortality rate Vs. Temperature

Mortality rate can be checked, instead of total number of deaths, to see if the weather affect on Coronavirus worsening.

In [29]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="tempC", size='mortality_rate', hover_name="country", 
                     range_color= [-20, 45], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: Mortality rate VS Temperature by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

We see that mortality rate was not so related to region or temperature. Mortality rate was observe to be high at the beginning stage of spread in each country (maybe because total inspection number was low), but many countries seemed to be converging to around 3% mortality rate.

Confirmed Cases Vs. Humidity
In [30]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="humidity", size='confirmed_log1p', hover_name="country", 
                     range_color= [0, 100], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: log1p(confirmed) VS Humidity by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

Corona spread was not only seen in China where humidity was low but also in Europe where humidity was high. Thus, Humidity did not seem to affect propagation of Covid-19 anyway.

Mortality Rate Vs. Humidity by Country
In [31]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="humidity", size='mortality_rate', hover_name="country", 
                     range_color= [0, 100], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: Mortality rate VS humidity by country', color_continuous_scale="portland", template="plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

Couldnt find any conclusive evidence to establish any correlation between humidity and mortality rate of Covid-19

Windspeed

Visualizing relationship between wind speed and Corona spread.

In [32]:
fig = px.scatter_geo(gdf.fillna(0), locations="country", locationmode='country names', 
                     color="windspeedKmph", size='confirmed_log1p', hover_name="country", 
                     range_color= [0, 40], 
                     projection="natural earth", animation_frame="date", 
                     title='COVID-19: log1p(Confirmed) VS Wind speed by country', color_continuous_scale="portland", template = "plotly_dark")
# fig.update(layout_coloraxis_showscale=False)
fig.show()

Could the relatively High Wind Speed be reason for wide spread of Covid-19 in the Europe region in short term?

From the data analysis, it can be concluded that that wheather changes hardly affected Corona's wide spread.

COVID_19 Countrywise EDA:

In [33]:
data = cleaned_data
data = pd.merge(data, countries_df, on='country')
In [34]:
# latest
full_latest = data[data['date'] == max(data['date'])].reset_index()
china_latest = data[data['country']=='China']
row_latest = full_latest[full_latest['country']!='China']

# latest condensed
full_latest_grouped = full_latest.groupby('country')['confirmed', 'deaths', 'recovered', 'active'].sum().reset_index()
china_latest_grouped = china_latest.groupby('state')['confirmed', 'deaths', 'recovered', 'active'].sum().reset_index()
row_latest_grouped = row_latest.groupby('country')['confirmed', 'deaths', 'recovered', 'active'].sum().reset_index()
In [35]:
temp = data.groupby(['country', 'state'])['confirmed', 'deaths', 'recovered', 'active'].max()
temp = data.groupby('date')['confirmed', 'deaths', 'recovered', 'active'].sum().reset_index()
temp = temp[temp['date']==max(temp['date'])].reset_index(drop=True)
temp.style.background_gradient(cmap='Pastel1')
Out[35]:
date confirmed deaths recovered active
0 2020-03-28 00:00:00 656708 30621 138286 487801
In [36]:
# Analysing Corona Cases Country-wise:
temp_f = full_latest_grouped.sort_values(by='confirmed', ascending=False)
temp_f = temp_f.reset_index(drop=True)
temp_f.style.background_gradient(cmap='Reds')
Out[36]:
country confirmed deaths recovered active
0 US 121478 2026 1072 118380
1 Italy 92472 10023 12384 70065
2 China 81999 3299 75100 3600
3 Spain 73235 5982 12285 54968
4 Germany 57695 433 8481 48781
5 France 38105 2317 5724 30064
6 Iran 35408 2517 11679 21212
7 United Kingdom 17312 1021 151 16140
8 Switzerland 14076 264 1530 12282
9 Netherlands 9819 640 6 9173
10 South Korea 9478 144 4811 4523
11 Belgium 9134 353 1063 7718
12 Austria 8271 68 225 7978
13 Turkey 7402 108 70 7224
14 Canada 5576 61 0 5515
15 Portugal 5170 100 43 5027
16 Norway 4015 23 7 3985
17 Brazil 3904 111 6 3787
18 Australia 3640 14 244 3382
19 Israel 3619 12 89 3518
20 Sweden 3447 105 16 3326
21 Ireland 2415 36 5 2374
22 Denmark 2366 65 57 2244
23 Malaysia 2320 27 320 1973
24 Chile 1909 6 61 1842
25 Luxembourg 1831 18 40 1773
26 Ecuador 1823 48 3 1772
27 Japan 1693 52 404 1237
28 Poland 1638 18 7 1613
29 Pakistan 1495 12 29 1454
30 Romania 1452 37 139 1276
31 Russia 1264 4 49 1211
32 Thailand 1245 6 97 1142
33 Saudi Arabia 1203 4 37 1162
34 South Africa 1187 1 31 1155
35 Finland 1167 9 10 1148
36 Indonesia 1155 102 59 994
37 Philippines 1075 68 35 972
38 Greece 1061 32 52 977
39 India 987 24 84 879
40 Iceland 963 2 114 847
41 Singapore 802 2 198 602
42 Panama 786 14 2 770
43 Dominican Republic 719 28 3 688
44 Mexico 717 12 4 701
45 Argentina 690 18 72 600
46 Slovenia 684 9 10 665
47 Peru 671 16 16 639
48 Serbia 659 10 0 649
49 Croatia 657 5 45 607
50 Estonia 645 1 20 624
51 Colombia 608 6 10 592
52 Qatar 590 1 45 544
53 Egypt 576 36 121 419
54 Iraq 506 42 131 333
55 Bahrain 476 4 265 207
56 United Arab Emirates 468 2 52 414
57 Algeria 454 29 31 394
58 New Zealand 451 0 50 401
59 Lebanon 412 8 30 374
60 Armenia 407 1 30 376
61 Morocco 402 25 11 366
62 Lithuania 394 7 1 386
63 Ukraine 356 9 5 342
64 Hungary 343 11 34 298
65 Bulgaria 331 7 11 313
66 Andorra 308 3 1 304
67 Latvia 305 0 1 304
68 Costa Rica 295 2 3 290
69 Slovakia 292 0 2 290
70 Tunisia 278 8 2 268
71 Uruguay 274 0 0 274
72 Bosnia and Herzegovina 258 5 5 248
73 Jordan 246 1 18 227
74 North Macedonia 241 4 3 234
75 Kuwait 235 0 64 171
76 Moldova 231 2 2 227
77 Kazakhstan 228 1 16 211
78 San Marino 224 22 6 196
79 Burkina Faso 207 11 21 175
80 Albania 197 10 31 156
81 Azerbaijan 182 4 15 163
82 Cyprus 179 5 15 159
83 Vietnam 174 0 21 153
84 Oman 152 0 23 129
85 Malta 149 0 2 147
86 Ghana 141 5 2 134
87 Senegal 130 0 18 112
88 Brunei 120 1 25 94
89 Venezuela 119 2 39 78
90 Cuba 119 3 4 112
91 Sri Lanka 113 1 9 103
92 Afghanistan 110 4 2 104
93 Uzbekistan 104 2 5 97
94 Mauritius 102 2 0 100
95 Cambodia 99 0 13 86
96 Honduras 95 1 3 91
97 Belarus 94 0 32 62
98 Cameroon 91 2 2 87
99 Georgia 90 0 14 76
100 Nigeria 89 1 3 85
101 Montenegro 84 1 0 83
102 Trinidad and Tobago 74 3 1 70
103 Bolivia 74 0 0 74
104 Rwanda 60 0 0 60
105 Kyrgyzstan 58 0 0 58
106 Paraguay 56 3 1 52
107 Liechtenstein 56 0 0 56
108 Bangladesh 48 5 15 28
109 Monaco 42 0 1 41
110 Kenya 38 1 1 36
111 Guatemala 34 1 10 23
112 Jamaica 30 1 2 27
113 Uganda 30 0 0 30
114 Zambia 28 0 0 28
115 Barbados 26 0 0 26
116 Madagascar 26 0 0 26
117 Togo 25 1 1 23
118 El Salvador 19 0 0 19
119 Mali 18 0 0 18
120 Maldives 16 0 9 7
121 Ethiopia 16 0 1 15
122 Tanzania 14 0 1 13
123 Djibouti 14 0 0 14
124 Mongolia 12 0 0 12
125 Equatorial Guinea 12 0 0 12
126 Dominica 11 0 0 11
127 Niger 10 1 0 9
128 Bahamas 10 0 1 9
129 Eswatini 9 0 0 9
130 Guinea 8 0 0 8
131 Guyana 8 1 0 7
132 Namibia 8 0 2 6
133 Mozambique 8 0 0 8
134 Laos 8 0 0 8
135 Seychelles 8 0 0 8
136 Suriname 8 0 0 8
137 Haiti 8 0 0 8
138 Antigua and Barbuda 7 0 0 7
139 Zimbabwe 7 1 0 6
140 Gabon 7 1 0 6
141 Grenada 7 0 0 7
142 Holy See 6 0 0 6
143 Eritrea 6 0 0 6
144 Benin 6 0 0 6
145 Fiji 5 0 0 5
146 Sudan 5 1 0 4
147 Mauritania 5 0 0 5
148 Cabo Verde 5 1 0 4
149 Syria 5 0 0 5
150 Angola 5 0 0 5
151 Nepal 5 0 1 4
152 Nicaragua 4 1 0 3
153 Libya 3 0 0 3
154 Liberia 3 0 0 3
155 Central African Republic 3 0 0 3
156 Chad 3 0 0 3
157 Gambia 3 1 0 2
158 Saint Lucia 3 0 1 2
159 Somalia 3 0 0 3
160 Bhutan 3 0 0 3
161 Guinea-Bissau 2 0 0 2
162 Belize 2 0 0 2
163 Papua New Guinea 1 0 0 1
164 Timor-Leste 1 0 0 1
Top 5 Nations with most Corona Cases:

USA Stats:================Italy Stats:===============China Stats:===============Spain Stats:==============Germany Stats:

Recovery Rate: 0.85%======Recovery Rate: 12.66%======Recovery Rate: 91.23%======Recovery Rate: 14.23%=====Recovery Rate: 13.08%

Death Rate: 1.50%=========Death Rate: 10.56%=========Death Rate: 4.02%==========Death Rate: 7.80%=========Death Rate: 0.67%

Active Cases: 97.00%======Active Cases: 76.78%=======Active Cases: 4.74%========Active Cases: 78.94%======Active Cases: 86.23%

Visualizing Total Covid-19 Cases world-wide:

In [37]:
countries = np.unique(temp_f['country'])
mean_conf = []
for country in countries:
    mean_conf.append(temp_f[temp_f['country'] == country]['confirmed'].sum())
    
# Building the dataframe

    data = [ dict(
        type = 'choropleth',
        locations = countries,
        z = mean_conf,
        locationmode = 'country names',
        text = countries,
        marker = dict(
            line = dict(color = 'rgb(0,0,0)', width = 1)),
            colorbar = dict(autotick = True, tickprefix = '', 
            title = 'Count')
            )
       ]
    
# Building the visual

    layout = dict(
    title = 'COVID-19 Confirmed Cases',
    geo = dict(
        showframe = False,
        showocean = True,
        oceancolor = 'rgb(0,255,255)',
        projection = dict(
        type = 'orthographic',
            rotation = dict(
                    lon = 60,
                    lat = 10),
        ),
        lonaxis =  dict(
                showgrid = True,
                gridcolor = 'rgb(102, 102, 102)'
            ),
        lataxis = dict(
                showgrid = True,
                gridcolor = 'rgb(102, 102, 102)'
                )
            ),
        )

fig = dict(data=data, layout=layout)
py.iplot(fig, validate=False, filename='worldmap')

COVID-19 "INDIA"

Loading Dataset:
In [39]:
covid_df = pd.read_csv('covid_19_india.csv')
covid_df.head()
Out[39]:
Sno Date Time State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured Deaths
0 1 30/01/20 6:00 PM Kerala 1 0 0 0
1 2 31/01/20 6:00 PM Kerala 1 0 0 0
2 3 01/02/20 6:00 PM Kerala 2 0 0 0
3 4 02/02/20 6:00 PM Kerala 3 0 0 0
4 5 03/02/20 6:00 PM Kerala 3 0 0 0
In [38]:
#Imorting necessary libraries and loading related data files:
In [40]:
import geopandas as gpd
import seaborn as sns
sns.set_style('dark')

map_df = gpd.read_file('Indian_States.shp')
map_df.loc[0,['st_nm']] = 'Andaman and Nicobar Islands'
map_df.head()
Out[40]:
st_nm geometry
0 Andaman and Nicobar Islands MULTIPOLYGON (((93.71976 7.20707, 93.71909 7.2...
1 Arunanchal Pradesh POLYGON ((96.16261 29.38078, 96.16860 29.37432...
2 Assam MULTIPOLYGON (((89.74323 26.30362, 89.74290 26...
3 Bihar MULTIPOLYGON (((84.50720 24.26323, 84.50355 24...
4 Chandigarh POLYGON ((76.84147 30.75996, 76.83599 30.73623...
In [41]:
df_india = pd.read_csv('covid_19_india.csv')
df_ind_bed =  pd.read_csv('HospitalBedsIndia.csv')
df_ind_ICMR =  pd.read_csv('ICMRTestingDetails.csv')
df_ind_indiv =  pd.read_csv('IndividualDetails.csv')
df_ind_census =  pd.read_csv('population_india_census2011.csv')

df_india['Confirmed'] = df_india['ConfirmedIndianNational']+ df_india['ConfirmedForeignNational']
In [42]:
df_forMap = df_india.drop(columns=['Date','Sno']).groupby('State/UnionTerritory').sum()
In [43]:
merged = map_df.set_index('st_nm').join(df_forMap)
#merged.fillna(0)
In [44]:
import matplotlib.pyplot as plt
%matplotlib inline
!pip install descartes
fig, ax = plt.subplots(5, figsize=(9, 45))


topic = ['Confirmed','ConfirmedIndianNational','ConfirmedForeignNational','Cured','Deaths']
cmaps = ['Oranges','Blues', 'Purples', 'Greens', 'Reds']
for i,l in enumerate(topic):
    ax[i].axis('off')
    ax[i].set_title('{} Cases of COVID 19 in India'.format(l), fontdict={'fontsize': '20', 'fontweight' : '5'})
    

    merged.plot(column=l, cmap=cmaps[i], linewidth=0.8, ax=ax[i], edgecolor='0.75', legend=True)
Requirement already satisfied: descartes in c:\programdata\anaconda3\lib\site-packages (1.1.0)
Requirement already satisfied: matplotlib in c:\programdata\anaconda3\lib\site-packages (from descartes) (3.1.1)
Requirement already satisfied: cycler>=0.10 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->descartes) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->descartes) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->descartes) (2.4.2)
Requirement already satisfied: python-dateutil>=2.1 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->descartes) (2.8.0)
Requirement already satisfied: numpy>=1.11 in c:\programdata\anaconda3\lib\site-packages (from matplotlib->descartes) (1.16.5)
Requirement already satisfied: six in c:\programdata\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib->descartes) (1.12.0)
Requirement already satisfied: setuptools in c:\programdata\anaconda3\lib\site-packages (from kiwisolver>=1.0.1->matplotlib->descartes) (41.4.0)

Highlights from above charts: Confirmed Cases: Maharashtra and Kerala topped the list, followed by other states like Uttar Pradesh, Rajasthan, karnataka, Delhi, Gujrat and other states

Foreign Nationals: Among Confirmed cases, few were Foreign nationalists too. Foreign nationalists confirmed cases came from: Haryana, Rajasthan, Maharashtra and Kerala majorly.

Deaths: Majority Deaths reported beloneged to Maharashtra, karnataka, Gujrat and Haryana

Cured: Some positive news of recovery cases recorded were from Uttar Pradesh, Kerala, Rajasthan and Haryana.

In [46]:
#Data Processing
#Plotting Daily Rise in Cases assorted by Confirmed, Cured and Deaths
In [61]:
import plotly.graph_objs as go
df_datechart = df_india.drop(columns=['State/UnionTerritory','Sno']).groupby('Date').sum()

cnf_data = go.Bar(x=df_datechart.index,
                         y=df_datechart.Confirmed,hovertext='Confirmed', name = "Confirmed")
dea_data = go.Bar(x=df_datechart.index,
                         y=df_datechart.Deaths,hovertext='Deaths',
                     yaxis='y2', name = "Deaths")
rec_data = go.Bar(x=df_datechart.index,
                         y=df_datechart.Cured,hovertext='Cured',
                     yaxis='y2', name = "Cured")

layout = go.Layout(title='COVID-19 progression in India', xaxis=dict(title='Date'),
                   yaxis=dict(title='Confirmed',color='blue'),
                  yaxis2=dict(title='Death', color='red',
                               overlaying='y', side='right'),
                  yaxis3=dict(title='   Cured', color='green',
                               overlaying='y'),
                  template="plotly_dark")

fig = go.Figure(data=[cnf_data,dea_data,rec_data], layout=layout)
fig.update_traces(marker_line_width=1.5, opacity=0.7)
fig.show()

The Corona Virus case figures began to surge from first week of March. By each passing day, contiinuous rise was witnessed in figures. On averge, 15 new cases were recorded daily until 2nd week of march. The exponential growth was seen henceforth. Death toll increased.

As of 28th March, India had:

Confirmed Cases: 873

Cured/Recoverd/Migrated: 79

Deaths: 19

Visualizing State wise Spread of Corona Virus:

In [62]:
df_ind=df_india.groupby(['State/UnionTerritory',"Date"]).head()
States=np.unique(df_ind['State/UnionTerritory'].values)
States
Out[62]:
array(['Andaman and Nicobar Islands', 'Andhra Pradesh', 'Bihar',
       'Chandigarh', 'Chattisgarh', 'Chhattisgarh', 'Delhi', 'Goa',
       'Gujarat', 'Haryana', 'Himachal Pradesh', 'Jammu and Kashmir',
       'Karnataka', 'Kerala', 'Ladakh', 'Madhya Pradesh', 'Maharashtra',
       'Manipur', 'Mizoram', 'Odisha', 'Pondicherry', 'Puducherry',
       'Punjab', 'Rajasthan', 'Tamil Nadu', 'Telengana', 'Uttar Pradesh',
       'Uttarakhand', 'West Bengal'], dtype=object)
In [63]:
top5aff_states=df_ind.groupby(['State/UnionTerritory']).max().sort_values(['ConfirmedIndianNational'],ascending=False)[:5].index.values
In [64]:
dates=df_ind[df_ind['State/UnionTerritory'] == 'Kerala']['Date'].values
In [65]:
for state in States:
    df1=df_ind[df_ind['State/UnionTerritory'] == state]
    
    rec_date_idx=np.where(dates==df1['Date'].values[0])[0][0]
    if rec_date_idx >0:
        df2=pd.DataFrame()
        df2['Date']=dates[:rec_date_idx]
        df2['ConfirmedIndianNational'] =  np.zeros(rec_date_idx)
        df2['ConfirmedForeignNational'] = np.zeros(rec_date_idx)
        df2['Cured']=np.zeros(rec_date_idx)
        df2['Deaths']=np.zeros(rec_date_idx)
        df2['State/UnionTerritory']=state
        df2=df2.append(df1,ignore_index=True)
    else: df2=df1
    df2.to_csv(state+'.csv',index=False)
In [66]:
plt.figure(figsize=(10,10))

for state in States:
    df1=pd.read_csv(''+state+'.csv')
    df1=df1[30:]
    plt.plot(df1['Date'],df1['ConfirmedIndianNational'],"*-",label=state)
    #np.savetxt(state+'.txt',df1['ConfirmedIndianNational']+df1['ConfirmedForeignNational'])
plt.xticks(rotation=90)
plt.legend()
plt.savefig('indian_states.png')

Maharashtra and Kerala are the worst affected states.

From 14th March to 28th March, Maharahtra recorded 170 new casesapproximately. Steep rise in daily count was noticed in the period

For Kerala, the Covid-19 impact was seen from 20th March. Within a week, 150 aprox. new cases were recorded.

No cases were recoreded in Tamil Nadu from 25th march to 28th March.

Analysing Count of Covid-19 Cases by Date:

In [70]:
covid_df = df_india.copy()
covid_df.drop(['Sno'],axis=1,inplace=True)
covid_df.index = range(1,covid_df.shape[0]+1)
covid_india = covid_df.copy()
In [71]:
covid_india['Total Confirmed cases'] = covid_india['ConfirmedIndianNational'] + covid_india['ConfirmedForeignNational']
covid_india['Total Active cases'] = covid_india['ConfirmedIndianNational'] + covid_india['ConfirmedForeignNational'] - covid_india['Cured'] - covid_india['Deaths']
In [72]:
covid_india.rename(columns={"State/UnionTerritory": "States", "ConfirmedIndianNational": "Confirmed cases (Indian Nationals)"},inplace=True)
covid_india.rename(columns={"ConfirmedForeignNational": "Confirmed cases (Foreign Nationals)", "Cured": "Cured/Discharged/Migrated"},inplace=True)
covid_india = covid_india[covid_india.States != 'Chattisgarh']
covid_india = covid_india[covid_india.States != 'Pondicherry']
covid_india = covid_india[covid_india.States != 'Union Territory of Jammu and Kashmir']
covid_india = covid_india[covid_india.States != 'Union Territory of Chandigarh']
covid_india = covid_india[covid_india.States != 'Union Territory of Ladakh']
In [73]:
covid_india.index = range(1,covid_india.shape[0]+1)
indian_states = covid_india.copy()
covid_india['Date'] = pd.to_datetime(covid_india['Date'], dayfirst=True)
covid_india.sort_values(by='Date', inplace=True)
In [74]:
covid_ind = covid_df.copy()
covid_ind['Total Confirmed cases'] = covid_ind['ConfirmedIndianNational'] + covid_ind['ConfirmedForeignNational']
covid_ind['Total Active cases'] = covid_ind['ConfirmedIndianNational'] + covid_ind['ConfirmedForeignNational'] - covid_ind['Cured'] - covid_ind['Deaths']
date_wise_data = covid_ind[["Date","Total Confirmed cases","Deaths","Cured"]]
date_wise_data['Date'] = date_wise_data['Date'].apply(pd.to_datetime, dayfirst=True)
date_wise_data
Out[74]:
Date Total Confirmed cases Deaths Cured
1 2020-01-30 1 0 0
2 2020-01-31 1 0 0
3 2020-02-01 2 0 0
4 2020-02-02 3 0 0
5 2020-02-03 3 0 0
... ... ... ... ...
442 2020-03-28 38 1 2
443 2020-03-28 48 0 1
444 2020-03-28 5 0 0
445 2020-03-28 45 0 11
446 2020-03-28 15 1 0

446 rows × 4 columns

In [75]:
from IPython.display import Markdown
date_wise_data = date_wise_data.groupby(["Date"]).sum().reset_index()
def formatted_text(string):
    display(Markdown(string))
formatted_text('***Date wise data***')
date_wise_data

Date wise data

Out[75]:
Date Total Confirmed cases Deaths Cured
0 2020-01-30 1 0 0
1 2020-01-31 1 0 0
2 2020-02-01 2 0 0
3 2020-02-02 3 0 0
4 2020-02-03 3 0 0
5 2020-02-04 3 0 0
6 2020-02-05 3 0 0
7 2020-02-06 3 0 0
8 2020-02-07 3 0 0
9 2020-02-08 3 0 0
10 2020-02-09 3 0 0
11 2020-02-10 3 0 0
12 2020-02-11 3 0 0
13 2020-02-12 3 0 0
14 2020-02-13 3 0 0
15 2020-02-14 3 0 0
16 2020-02-15 3 0 0
17 2020-02-16 3 0 0
18 2020-02-17 3 0 0
19 2020-02-18 3 0 0
20 2020-02-19 3 0 0
21 2020-02-20 3 0 0
22 2020-02-21 3 0 0
23 2020-02-22 3 0 0
24 2020-02-23 3 0 0
25 2020-02-24 3 0 0
26 2020-02-25 3 0 0
27 2020-02-26 3 0 0
28 2020-02-27 3 0 0
29 2020-02-28 3 0 0
30 2020-02-29 3 0 0
31 2020-03-01 3 0 0
32 2020-03-02 5 0 0
33 2020-03-03 6 0 3
34 2020-03-04 28 0 3
35 2020-03-05 30 0 3
36 2020-03-06 31 0 3
37 2020-03-07 34 0 3
38 2020-03-08 39 0 3
39 2020-03-09 46 0 3
40 2020-03-10 58 0 3
41 2020-03-11 60 0 3
42 2020-03-12 74 0 3
43 2020-03-13 81 1 3
44 2020-03-14 84 2 10
45 2020-03-15 110 2 13
46 2020-03-16 114 2 13
47 2020-03-17 137 3 14
48 2020-03-18 151 3 14
49 2020-03-19 173 4 20
50 2020-03-20 223 4 23
51 2020-03-21 283 4 23
52 2020-03-22 360 7 24
53 2020-03-23 433 7 24
54 2020-03-24 519 9 40
55 2020-03-25 606 10 43
56 2020-03-26 694 15 45
57 2020-03-27 724 17 67
58 2020-03-28 873 19 79
In [77]:
#Time- Bound Cases Visualization:
In [76]:
import plotly.offline as py
import plotly.express as px
temp = date_wise_data.melt(id_vars="Date", value_vars=['Cured', 'Deaths', 'Total Confirmed cases'],
                 var_name='Case', value_name='Count')

fig = px.area(temp, x="Date", y="Count", color='Case',title='Time wise cases analysis', color_discrete_sequence = ['#21bf73', '#ff2e63', '#fe9801'])
fig.show()

Significant rise in new cases was seen from 1st March 2020.

In March, the count soared from 3 to 873 until 28th march.

The majority cases were still active till then. 19 Death Cases were reported and 79 patients recovered fro Covid-19 until 28th march.

In [79]:
#Tree Map:
In [78]:
statewise_cases = pd.DataFrame(covid_ind.groupby(['State/UnionTerritory'])['Total Confirmed cases', 'Deaths', 'Cured'].max().reset_index())
#statewise_cases["Country"] = "India" # in order to have a single root node
fig = px.treemap(statewise_cases, path=['State/UnionTerritory'], values='Total Confirmed cases',
                  color='Total Confirmed cases', hover_data=['State/UnionTerritory'],
                  color_continuous_scale='RdBu')
fig.show()
In [80]:
#Data Processing
In [81]:
covid_ind.head()
covid_ind['Total Cases'] = covid_ind['ConfirmedIndianNational'] + covid_ind['ConfirmedForeignNational']

# Adding Active Cases
covid_ind['Active Cases'] = covid_ind['Total Cases'] - covid_ind['Cured'] - covid_ind['Deaths']

# Renaming Column Names
covid_ind.rename(columns = {'Cured':'Cured/Discharged/Migrated'}, inplace = True)
In [82]:
# Create Temp DF 
temp_df = covid_ind[covid_ind['Date']=='28/03/20']

# Statewise Total Cases
df_statewise = temp_df.groupby(['State/UnionTerritory', 'ConfirmedIndianNational', 'ConfirmedForeignNational',  'Cured/Discharged/Migrated'\
                      , 'Deaths', 'Active Cases'])['Total Cases'].sum().reset_index()
df_statewise
Out[82]:
State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured/Discharged/Migrated Deaths Active Cases Total Cases
0 Andaman and Nicobar Islands 2 0 0 0 2 2
1 Andhra Pradesh 14 0 1 0 13 14
2 Bihar 9 0 0 1 8 9
3 Chandigarh 7 0 0 0 7 7
4 Chhattisgarh 6 0 0 0 6 6
5 Delhi 38 1 6 1 32 39
6 Goa 3 0 0 0 3 3
7 Gujarat 44 1 0 3 42 45
8 Haryana 19 14 11 0 22 33
9 Himachal Pradesh 3 0 0 1 2 3
10 Jammu and Kashmir 18 0 1 1 16 18
11 Karnataka 55 0 3 2 50 55
12 Kerala 165 8 11 0 162 173
13 Ladakh 13 0 3 0 10 13
14 Madhya Pradesh 30 0 0 2 28 30
15 Maharashtra 177 3 25 5 150 180
16 Manipur 1 0 0 0 1 1
17 Mizoram 1 0 0 0 1 1
18 Odisha 3 0 0 0 3 3
19 Puducherry 1 0 0 0 1 1
20 Punjab 38 0 1 1 36 38
21 Rajasthan 46 2 3 0 45 48
22 Tamil Nadu 32 6 2 1 35 38
23 Telengana 38 10 1 0 47 48
24 Uttar Pradesh 44 1 11 0 34 45
25 Uttarakhand 4 1 0 0 5 5
26 West Bengal 15 0 0 1 14 15
In [83]:
# Creating function fo bolding out max
def highlight_max_count(count):
    is_max = count == count.max()
    return ['background-color: #1f77b4' if v else '' for v in is_max]

# Distribution of Cases in India
df_statewise.style \
    .background_gradient(cmap="Blues", subset=['ConfirmedIndianNational', 'ConfirmedForeignNational', 'Total Cases', 'Active Cases'])\
    .background_gradient(cmap="Greens", subset=['Cured/Discharged/Migrated'])\
    .background_gradient(cmap="Reds", subset=['Deaths'])
Out[83]:
State/UnionTerritory ConfirmedIndianNational ConfirmedForeignNational Cured/Discharged/Migrated Deaths Active Cases Total Cases
0 Andaman and Nicobar Islands 2 0 0 0 2 2
1 Andhra Pradesh 14 0 1 0 13 14
2 Bihar 9 0 0 1 8 9
3 Chandigarh 7 0 0 0 7 7
4 Chhattisgarh 6 0 0 0 6 6
5 Delhi 38 1 6 1 32 39
6 Goa 3 0 0 0 3 3
7 Gujarat 44 1 0 3 42 45
8 Haryana 19 14 11 0 22 33
9 Himachal Pradesh 3 0 0 1 2 3
10 Jammu and Kashmir 18 0 1 1 16 18
11 Karnataka 55 0 3 2 50 55
12 Kerala 165 8 11 0 162 173
13 Ladakh 13 0 3 0 10 13
14 Madhya Pradesh 30 0 0 2 28 30
15 Maharashtra 177 3 25 5 150 180
16 Manipur 1 0 0 0 1 1
17 Mizoram 1 0 0 0 1 1
18 Odisha 3 0 0 0 3 3
19 Puducherry 1 0 0 0 1 1
20 Punjab 38 0 1 1 36 38
21 Rajasthan 46 2 3 0 45 48
22 Tamil Nadu 32 6 2 1 35 38
23 Telengana 38 10 1 0 47 48
24 Uttar Pradesh 44 1 11 0 34 45
25 Uttarakhand 4 1 0 0 5 5
26 West Bengal 15 0 0 1 14 15
In [84]:
# Statewise 
x = df_statewise.groupby('State/UnionTerritory')['Active Cases'].sum().sort_values(ascending=False).to_frame()
x.style.background_gradient(cmap='Reds')
Out[84]:
Active Cases
State/UnionTerritory
Kerala 162
Maharashtra 150
Karnataka 50
Telengana 47
Rajasthan 45
Gujarat 42
Punjab 36
Tamil Nadu 35
Uttar Pradesh 34
Delhi 32
Madhya Pradesh 28
Haryana 22
Jammu and Kashmir 16
West Bengal 14
Andhra Pradesh 13
Ladakh 10
Bihar 8
Chandigarh 7
Chhattisgarh 6
Uttarakhand 5
Goa 3
Odisha 3
Himachal Pradesh 2
Andaman and Nicobar Islands 2
Manipur 1
Mizoram 1
Puducherry 1

Visualizing Mortality Rate of Covid-19 Statewise:

In [85]:
#covid_ind.drop('ConfirmedIndianNational',axis = 1,inplace=True)
#covid_ind.drop('ConfirmedForeignNational',axis = 1,inplace=True)
#covid_ind
temp = df_statewise.copy()
#temp.drop('ConfirmedIndianNational',axis = 1,inplace=True)
temp.drop('ConfirmedForeignNational',axis = 1,inplace=True)

temp = temp.sort_values(by='Total Cases', ascending=False)
temp = temp[['State/UnionTerritory', 'Total Cases', 'Active Cases', 'Deaths', 'Cured/Discharged/Migrated']]
temp['Mortality Rate'] = round((temp['Deaths']/temp['Total Cases'])*100,2)
temp = temp.reset_index(drop=True)

temp.head(10)

temp.style.background_gradient(cmap="Reds", subset=['Total Cases', 'Active'])\
            .background_gradient(cmap="Greens", subset=['Cured/Discharged/Migrated'])\
            .background_gradient(cmap="Oranges_r", subset=['Deaths'])\
            .background_gradient(cmap="seismic_r",subset=['Mortality Rate'])
Out[85]:
State/UnionTerritory Total Cases Active Cases Deaths Cured/Discharged/Migrated Mortality Rate
0 Maharashtra 180 150 5 25 2.78
1 Kerala 173 162 0 11 0
2 Karnataka 55 50 2 3 3.64
3 Telengana 48 47 0 1 0
4 Rajasthan 48 45 0 3 0
5 Uttar Pradesh 45 34 0 11 0
6 Gujarat 45 42 3 0 6.67
7 Delhi 39 32 1 6 2.56
8 Tamil Nadu 38 35 1 2 2.63
9 Punjab 38 36 1 1 2.63
10 Haryana 33 22 0 11 0
11 Madhya Pradesh 30 28 2 0 6.67
12 Jammu and Kashmir 18 16 1 1 5.56
13 West Bengal 15 14 1 0 6.67
14 Andhra Pradesh 14 13 0 1 0
15 Ladakh 13 10 0 3 0
16 Bihar 9 8 1 0 11.11
17 Chandigarh 7 7 0 0 0
18 Chhattisgarh 6 6 0 0 0
19 Uttarakhand 5 5 0 0 0
20 Odisha 3 3 0 0 0
21 Himachal Pradesh 3 2 1 0 33.33
22 Goa 3 3 0 0 0
23 Andaman and Nicobar Islands 2 2 0 0 0
24 Manipur 1 1 0 0 0
25 Mizoram 1 1 0 0 0
26 Puducherry 1 1 0 0 0
In [86]:
%%HTML
<div class='tableauPlaceholder' id='viz1585145553118' style='position: relative'>
    <noscript>
        <a href='#'>
        <img alt=' ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Bo&#47;Book1_31496&#47;Dashboard3&#47;1_rss.png' style='border: none' />
        </a>
    </noscript>
    <object class='tableauViz'  style='display:none;'>
        <param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' />
        <param name='embed_code_version' value='3' />
        <param name='site_root' value='' />
        <param name='name' value='Book1_31496&#47;Dashboard3' />
        <param name='tabs' value='no' />
        <param name='toolbar' value='yes' />
        <param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Bo&#47;Book1_31496&#47;Dashboard3&#47;1.png' />
        <param name='animate_transition' value='yes' />
        <param name='display_static_image' value='yes' />
        <param name='display_spinner' value='yes' />
        <param name='display_overlay' value='yes' />
        <param name='display_count' value='yes' />
        <param name='filter' value='publish=yes' />
    </object>
</div>
<script type='text/javascript'>
    var divElement = document.getElementById('viz1585145553118');
    var vizElement = divElement.getElementsByTagName('object')[0];
    if ( divElement.offsetWidth > 800 ) 
        { 
            vizElement.style.minWidth='420px';
            vizElement.style.maxWidth='650px';
            vizElement.style.width='100%';
            vizElement.style.minHeight='587px';
            vizElement.style.maxHeight='887px';
            vizElement.style.height=(divElement.offsetWidth*0.75)+'px';
        }
    else if ( divElement.offsetWidth > 500 )
        { 
            vizElement.style.minWidth='420px';
            vizElement.style.maxWidth='650px';
            vizElement.style.width='100%';
            vizElement.style.minHeight='587px';
            vizElement.style.maxHeight='887px';
            vizElement.style.height=(divElement.offsetWidth*0.75)+'px';
        } 
    else 
        { 
            vizElement.style.width='100%';
            vizElement.style.height='727px';
        }
var scriptElement = document.createElement('script');
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';
vizElement.parentNode.insertBefore(scriptElement, vizElement);
</script>

Maharashtra and Kerala Reported maximum cases, no cases were reported in North East India except few parts like: Manipur, Mizoram(1 Each).

This visualization showed the spread of the virus in India, and the cluster formations. Two clusters(Dark Shades), which were quite profound: Kerala and Maharashtra - with 186 and 182 confirmed cases respectively, and 6 and 1 deaths respectively. Kerala was the first affected place in India. There were small clusters forming in North India. Given the population density of India, this would prove harmful, and could lead to a massive single cluster, if people abstain from good practices, such as self-quarantine, sanitization etc. The clusters in rest of India were sparsely situated(Light Shaded). Hence, proper caution would gradually lead to the death of those clusters.

Visualizing Current Situation:

In [87]:
#Overall 
#df_india = pd.read_csv('covid_19_india.csv')


#cov_ind = df_india.copy()
#cov_ind['Confirmed'] = cov_ind['ConfirmedIndianNational'] + cov_ind['ConfirmedForeignNational']
#cov_ind['Active'] = cov_ind['ConfirmedIndianNational'] + cov_ind['ConfirmedForeignNational'] - cov_ind['Cured'] - cov_ind['Deaths']

a_c= temp['Active Cases'].sum()
r_d = temp['Cured/Discharged/Migrated'].sum()
d_h = temp['Deaths'].sum()
fig = go.Figure(data=[go.Pie(labels=['Active Cases','Cured','Death'],
                             values= [a_c,r_d,d_h],hole =.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=['#263fa3', '#2fcc41','#cc3c2f'], line=dict(color='#FFFFFF', width=2)))
fig.update_layout(title_text='Current Situation in India',plot_bgcolor='rgb(275, 270, 273)')
fig.show()
In India, of all total cases:

Active Cases: 775(88.8%)

Cured/Recovered/Migrated: 79(9.05%)

Deaths: 19(2.18%)

Indian Hospitals Data

In [88]:
#Reading Hospital Data
hospdata=pd.read_csv("HospitalBedsIndia.csv")
In [89]:
#Cleaning Hospital Data:
In [90]:
hospdata=hospdata.drop(['Unnamed: 12', 'Unnamed: 13'], axis=1)
hospdata.rename(columns = {'NumPrimaryHealthCenters_HMIS':'Primary Health Center', 
                           'NumCommunityHealthCenters_HMIS':'Community Health Center',
                           'NumSubDistrictHospitals_HMIS':'Sub District Hospital', 
                           'NumDistrictHospitals_HMIS':'District Hospitals'}, inplace = True) 
hospdata.rename(columns = {'TotalPublicHealthFacilities_HMIS':'Total Public Health Facility', 
                           'NumPublicBeds_HMIS':'Public Beds',
                           'NumRuralHospitals_NHP18':'Rural Hospitals', 
                           'NumRuralBeds_NHP18':'Rural Hosp Beds',
                           'NumUrbanHospitals_NHP18':'Urban Hospitals',
                           'NumUrbanBeds_NHP18':'Urban Hosp Beds'}, inplace = True) 
hospdata1=hospdata.drop([36,37], axis=0)
Visualizing Satewise count of Urban Health Centres:
In [92]:
fig = px.bar(hospdata1.sort_values('Urban Hospitals', ascending=False).sort_values('Urban Hospitals', ascending=True), 
             x="Urban Hospitals", y="State/UT", title='Total Urban Health Centres', text='Urban Hospitals', orientation='h',width=1000, height=700, range_x = [0, max(hospdata1['Urban Hospitals'])]) 
            
fig.update_traces(marker_color='#46cdcf', opacity=0.8, textposition='inside')

fig.update_layout(plot_bgcolor='rgb(250, 242, 242)')
fig.show()

Urban Hospitals Count:

Tamil Nadu: 525

Maharashtra: 438

Karnataka: 374

Analysing Area of State, its population and Progression Rate of Covid-19, will India be able to restrict the spread of Covid-19 given its medical contraints?

Hospital Beds: Rural and Urban

In [ ]:
fig = px.bar(hospdata1.sort_values('Rural Hospitals', ascending=False).sort_values('Rural Hospitals', ascending=True), 
             x="Rural Hospitals", y="State/UT", title='Total Rural Health Centers', text='Rural Hospitals', orientation='h',width=1000, height=700, range_x = [0, max(hospdata1['Rural Hospitals'])]) 
            
fig.update_traces(marker_color='#46cdcf', opacity=0.8, textposition='inside')

fig.update_layout(plot_bgcolor='rgb(230, 242, 242)')
fig.show()
In [340]:
sns.set_style("white")
sns.set_context({"figure.figsize": (24, 24)})


sns.barplot(x = hospdata['Urban Hosp Beds'], y = hospdata1['State/UT'], color = "red")


bottom_plot = sns.barplot(x = hospdata1['Rural Hosp Beds'], y = hospdata1['State/UT'], color = "#0000A3", )


topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3',  edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['Rural Hosp Beds', 'Urban Hosp Beds'], loc=1, ncol = 2, prop={'size':16})
l.draw_frame(False)


sns.despine(left=True)
bottom_plot.set_ylabel("States")
bottom_plot.set_xlabel("Hospital Beds")


for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
             bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
    item.set_fontsize(16)

West Bengal had most Hospital Beds: 20000 Rural and 40000 Urban. It could fight well with the Virus with those medical facilities

Karnataka statistics revealed, it had total of 50000 hospital beds capacity. Out of which 21000 approx are Rural Beds.

Rajasthan, Madhya Pradesh, Maharashtra and UttarPradesh though being bigger states w.r.t area and population, they have limited beds capacity.Those states require to take immediate necessary steps to tackle the Covid-19 Pandemic.

Statewise Primary Health Centres:

In [97]:
fig = px.bar(hospdata1, x="Primary Health Center", y="State/UT", color='Primary Health Center', orientation='h', height=800,
             title='Primary Health Centre', color_discrete_sequence = px.colors.cyclical.mygbm)

fig.update_layout(plot_bgcolor='rgb(250, 242, 242)')
fig.show()

Primary Health Centres are meant to provide primary medical treatment to patients. Uttar Pradesh had most Primary Health Centres(3277) Other states need to rise to the cause too.

States' Total Medical Facilities at helm:

In [98]:
fig = px.bar(hospdata1, x="Total Public Health Facility", y="State/UT", color='Total Public Health Facility', orientation='h', height=800,
             title='Total Health Facility in India', color_discrete_sequence = px.colors.cyclical.mygbm)

fig.update_layout(plot_bgcolor='rgb(250, 242, 242)')
fig.show()

Statewise Medical Facilities Preparations:

In [99]:
fig = px.scatter(hospdata1, x="Total Public Health Facility", y="Public Beds", color="State/UT", marginal_y="rug", marginal_x="histogram")
fig
District Hospitals in States:
In [100]:
fig = px.scatter(hospdata1, x="Sub District Hospital", y="District Hospitals", color="State/UT", marginal_y="rug", marginal_x="histogram")
fig

Overall Public Health Facilities are highest in Uttar Pradesh followed by Maharashtra and Karnataka. Urban Health Centers are highest in Tamil Nadu. Rural Health Centers are highest in UP.

ICMR Testing Details:

In [101]:
#Loading Data:
df_hos_bed = pd.read_csv('ICMRTestingDetails.csv')
df_hos_bed['DateTime'] = pd.to_datetime(df_hos_bed['DateTime'])
df_hos_bed['DateTime'] = df_hos_bed['DateTime'].dt.date
df_hos_bed.head()
Out[101]:
SNo DateTime TotalSamplesTested TotalIndividualsTested TotalPositiveCases Source
0 1 2020-03-13 6500 5900 78 Press_Release_ICMR_13March2020.pdf
1 2 2020-03-18 13125 12235 150 ICMR_website_update_18March_6PM_IST.pdf
2 3 2020-03-19 13316 12426 168 ICMR_website_update_19March_10AM_IST_V2.pdf
3 4 2020-03-19 14175 13285 182 ICMR_website_update_19March_6PM_IST.pdf
4 5 2020-03-20 14376 13486 206 ICMR_website_update_20March_10AM_IST.pdf
In [102]:
#Data Cleaning and Processing:
In [103]:
df_hos_bed['totalnegative'] = df_hos_bed['TotalIndividualsTested'] - df_hos_bed['TotalPositiveCases']
In [104]:
df_hos_bed_per_day = df_hos_bed.drop_duplicates(subset=['DateTime'], keep='last')
df_hos_bed_per_day['test_results_posratio'] = round(df_hos_bed_per_day['TotalPositiveCases']/df_hos_bed_per_day['TotalIndividualsTested'], 3)
df_hos_bed_per_day.head()
Out[104]:
SNo DateTime TotalSamplesTested TotalIndividualsTested TotalPositiveCases Source totalnegative test_results_posratio
0 1 2020-03-13 6500 5900 78 Press_Release_ICMR_13March2020.pdf 5822 0.013
1 2 2020-03-18 13125 12235 150 ICMR_website_update_18March_6PM_IST.pdf 12085 0.012
3 4 2020-03-19 14175 13285 182 ICMR_website_update_19March_6PM_IST.pdf 13103 0.014
5 6 2020-03-20 15404 14514 236 ICMR_website_update_20March_6PM_IST.pdf 14278 0.016
7 8 2020-03-21 16911 16021 315 ICMR_website_update_21March_6PM_IST.pdf 15706 0.020

Validating ICMR Tests Results:

In [105]:
colors = ['#269A06', '#AF0E06']
negative = round(df_hos_bed['totalnegative'].sum()/df_hos_bed['TotalIndividualsTested'].sum()*100, 2)
positive = round(df_hos_bed['TotalPositiveCases'].sum()/df_hos_bed['TotalIndividualsTested'].sum()*100, 2)
fig = go.Figure(data=[go.Pie(labels=['People who tested Negative','People who tested Positive'],
                             values= [negative,positive],hole =.5)])
                          

fig.update_traces(title_text='COVID19 Test Results', hoverinfo='label+percent', textinfo='value', textfont_size=15,
                  marker=dict(colors=colors, line=dict(color='#FFFFFF', width=2)))
fig.show()

The phase when the testings were done on suspects, majority of suspects were either citizens with recent travel history or the relatives and acquaintances of such citizens. Testings were deliberately and randomly carried at Airports majorly. Hence, the ratio of negative tested was found so high.

Ratio of Positive Detection per Test w.r.t. Time

In [106]:
fig1 = go.Figure()
fig1.add_trace(go.Scatter(x=df_hos_bed_per_day['DateTime'], y=df_hos_bed_per_day['test_results_posratio'], name='Confirmed Cases', \
                         marker=dict(color='#D32210')))
fig1.layout.update(title_text='COVID-19 Positive Detection per Test Ratio in India w.r.t. Time',xaxis_showgrid=False, width=700,
        height=500,font=dict(
#         family="Courier New, monospace",
        size=12,
        color="white"
    ))
fig1.layout.plot_bgcolor = '#097E99'
fig1.layout.paper_bgcolor = '#097E99'
fig1.show()
It can be noticed that cumulatively over the period of almost two weeks 2.05% of the total people tested for COVID-19 had been found positive in India
But the positive ratio that turned 2x between March 18 and 25, that went from 1.2% to 2.4% was something that cannot be ignored .

Capturing Individual Details

In [107]:
df_indi = pd.read_csv('IndividualDetails.csv')
df_indi.head()
Out[107]:
id unique_id government_id diagnosed_date age gender detected_city detected_city_pt detected_district detected_state nationality current_status status_change_date notes current_location current_location_pt contacts
0 1 1 KL-TS-P1 2020-01-30 20.0 Female Thrissur SRID=4326;POINT (76.21325419999999 10.5256264) Thrissur Kerala India Recovered 2020-02-14 Travelled from Wuhan.\nStudent from Wuhan NaN SRID=4326;POINT (76.21325419999999 10.5256264) []
1 2 2 KL-AL-P1 2020-02-02 NaN Unknown Alappuzha SRID=4326;POINT (76.333482 9.498000100000001) Alappuzha Kerala India Recovered 2020-02-14 Travelled from Wuhan.\nStudent from Wuhan NaN SRID=4326;POINT (76.333482 9.498000100000001) []
2 3 3 KL-KS-P1 2020-02-03 NaN Unknown Kasaragod SRID=4326;POINT (80 20) Kasaragod Kerala India Recovered 2020-02-14 Travelled from Wuhan.\nStudent from Wuhan NaN SRID=4326;POINT (80 20) []
3 4 4 DL-P1 2020-03-02 45.0 Male East Delhi (Mayur Vihar) SRID=4326;POINT (80 20) East Delhi Delhi India Recovered 2020-03-15 Travelled from Austria, Italy.\nTravel history... NaN SRID=4326;POINT (80 20) [22,23,24,25,26,27,47]
4 5 5 TS-P1 2020-03-02 24.0 Male Hyderabad SRID=4326;POINT (78.4349398685041 17.4263524) Hyderabad Telangana India Recovered 2020-03-02 .\nTravel history to Dubai, Singapore contact NaN SRID=4326;POINT (78.4349398685041 17.4263524) []
In [108]:
df_indi.dropna(subset=['current_status', 'age'], inplace=True)
df_indi.reset_index(drop=True, inplace=True)
In [109]:
df_indi['current_status'].unique(), df_indi.shape
Out[109]:
(array(['Recovered', 'Hospitalized', 'Deceased'], dtype=object), (350, 17))
In [110]:
df1_indians = df_indi[df_indi['current_status'] == 'Deceased']
df3_indians = df_indi[df_indi['current_status'] == 'Hospitalized']
df2_indians = df_indi[df_indi['current_status'] == 'Recovered']
cdf = pd.concat([df1_indians, df2_indians, df3_indians])
plt.figure(figsize=(12,12))
sns.boxplot(x="current_status", y="age", data=cdf).set_title("India's Outcome till now Age-Wise")
plt.show()

The Patients Hospitalized belonged to age group: 22 to 60 Recovered Patients belonged to age group: 30 to 65 Deceased Patients: All Deceased Patients were Senior Citizens except a adult aged around 40 years.

Analysing Cases with and without Travel History:
In [111]:
pep_no_trav_his = df_indi[df_indi['notes'].str.contains('Travel') == False]
pep_with_trav_his = df_indi[df_indi['notes'].str.contains('Travel') == True]
df_indi['id'].nunique(), pep_no_trav_his['id'].nunique()
Out[111]:
(350, 81)
In [112]:
colors = ['#B5B200', '#1300B5']
negative = round(pep_no_trav_his['id'].nunique()/df_indi['id'].nunique()*100, 2)
positive = round(pep_with_trav_his['id'].nunique()/df_indi['id'].nunique()*100, 2)
                         
fig = px.pie(pep_no_trav_his, values=[negative, positive], names=['Patients w/o Travel History', 'Patients with Travel History'], \
             title='Patients with and without Travel History')
fig.show()

23% of Total Patients tested positive were the ones with no Recent Travel History( Majority of them were Relatives, friends or people that directly or indirectly came in contact with the patients with Travel History) However, 23% is a large number to indicate the risk of Progression of Virus into Satge III (Community Spread)

In [113]:
#Data Processing
In [114]:
individual_details = df_indi.rename(columns=lambda x: x.strip())

cols_to_drop = ['unique_id','id','government_id','detected_city_pt','notes','current_location','current_location_pt','contacts']

filter_data = individual_details.drop(cols_to_drop,axis=1)

filter_data.head()
Out[114]:
diagnosed_date age gender detected_city detected_district detected_state nationality current_status status_change_date
0 2020-01-30 20.0 Female Thrissur Thrissur Kerala India Recovered 2020-02-14
1 2020-03-02 45.0 Male East Delhi (Mayur Vihar) East Delhi Delhi India Recovered 2020-03-15
2 2020-03-02 24.0 Male Hyderabad Hyderabad Telangana India Recovered 2020-03-02
3 2020-03-03 69.0 Male Jaipur Jaipur Rajasthan Italy Recovered 2020-03-03
4 2020-03-04 55.0 Unknown Gurugram Gurugram Haryana Italy Hospitalized 2020-03-04
In [115]:
# Convert dates in one format
import datetime as dt

filter_data['status_change_date'] = pd.to_datetime(filter_data['status_change_date'])
filter_data['diagnosed_date'] = pd.to_datetime(filter_data['diagnosed_date'])

filter_data['Duration of Any Status'] = filter_data['status_change_date'] - filter_data['diagnosed_date']
filter_data['Duration of Any Status'] = filter_data['Duration of Any Status'].dt.days

filter_data['status_change_date'] = filter_data['status_change_date'].dt.strftime('%Y-%m-%d')
filter_data['diagnosed_date'] = filter_data['diagnosed_date'].dt.strftime('%Y-%m-%d')
In [116]:
filter_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 10 columns):
diagnosed_date            350 non-null object
age                       350 non-null float64
gender                    350 non-null object
detected_city             313 non-null object
detected_district         291 non-null object
detected_state            350 non-null object
nationality               216 non-null object
current_status            350 non-null object
status_change_date        350 non-null object
Duration of Any Status    347 non-null float64
dtypes: float64(2), object(8)
memory usage: 27.5+ KB
In [117]:
#Dropping Detetcted City and District as there are values for State


drop_cols = ['detected_city','detected_district']

covid_india_df = filter_data.drop(drop_cols,axis=1)

covid_india_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 8 columns):
diagnosed_date            350 non-null object
age                       350 non-null float64
gender                    350 non-null object
detected_state            350 non-null object
nationality               216 non-null object
current_status            350 non-null object
status_change_date        350 non-null object
Duration of Any Status    347 non-null float64
dtypes: float64(2), object(6)
memory usage: 22.0+ KB
In [118]:
covid_india_df.describe()
Out[118]:
age Duration of Any Status
count 350.000000 347.000000
mean 40.877143 0.126801
std 17.940184 1.163537
min -1.000000 -1.000000
25% 26.000000 0.000000
50% 38.000000 0.000000
75% 55.000000 0.000000
max 96.000000 15.000000
In [119]:
#Filling NAs in age with median
covid_india_df.describe()
covid_india_df['age'] = covid_india_df['age'].fillna(covid_india_df.age.median())
covid_india_df['current_status'] = covid_india_df['current_status'].fillna(method='ffill')

covid_india_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350 entries, 0 to 349
Data columns (total 8 columns):
diagnosed_date            350 non-null object
age                       350 non-null float64
gender                    350 non-null object
detected_state            350 non-null object
nationality               216 non-null object
current_status            350 non-null object
status_change_date        350 non-null object
Duration of Any Status    347 non-null float64
dtypes: float64(2), object(6)
memory usage: 22.0+ KB
In [120]:
covid_india_df.head()
Out[120]:
diagnosed_date age gender detected_state nationality current_status status_change_date Duration of Any Status
0 2020-01-30 20.0 Female Kerala India Recovered 2020-02-14 15.0
1 2020-03-02 45.0 Male Delhi India Recovered 2020-03-15 13.0
2 2020-03-02 24.0 Male Telangana India Recovered 2020-03-02 0.0
3 2020-03-03 69.0 Male Rajasthan Italy Recovered 2020-03-03 0.0
4 2020-03-04 55.0 Unknown Haryana Italy Hospitalized 2020-03-04 0.0
In [121]:
# Now at the broader scale by looking at the Duration. And we'll see if we can do any Clustering
plt.figure(figsize=(18,9))
sns.scatterplot(y = covid_india_df['Duration of Any Status'],x = covid_india_df['current_status']);
plt.xlabel('Status of the Patient');
plt.ylabel('Duration of Days from the time they were Admitted');
plt.title('Distribution of Duration of Days wioth the Status of patients!');

Impact of Covid-19 on different Age Groups:

In [122]:
# disecting age into bins to see which age group was affected most with covid-19
# Taking a broad age group to form bins 
age_bins = [0,20,40,60,80,100]
plt.figure(figsize=(16,6))
sns.countplot(x=pd.cut(covid_india_df.age, age_bins), hue=covid_india_df.current_status, palette = ['#263A90', '#FFFF00', '#ee0a0a'])
plt.xticks(rotation=90)
plt.xlabel("Age Groups")
plt.yscale('log')
plt.title("Age Groups affected with Covid-19")
plt.grid(True)
plt.show()

People with Age beyond 40s were found to be more vulnerable.

It was observed that Senior Citizens were at potential risk as their Death rate was higher than Recovery Rate.

People with Age below 40 had hgher recovery chances.

In age group of Children and young-adults, no deaths were reported.

Estimation of realistic number of cases in India:

Abstract:

The first COVID-19 case in India was reported on 30th January 2020, a student who arrived in Kerala state from Wuhan, China followed by 2 more cases in Kerala. For almost a month, no new cases were reported in India.

However, on 8th March 2020, five new cases of COVID-19 in Kerala were again reported and since then the cases had been rising affecting 22 states and resulting in 19 deaths across the country as of 28th March 2020.

On 13th March 2020, India reported its first coronavirus fatality in the state of Karnataka, followed by 3 more deaths in other states.

On 28th March 2020, confirmed COVID-19 cases had risen to 775 with the state of Maharashtra bagging the maximum number of cases.

As the number of confirmed cases increased in India, the question arised "did those numbers represent the true number of cases in India?"


Three statistically determined variables that were used to obtain the estimation curve:

doubling_rate = 6. days

Mortality_rate = 0.01 or 1 %

days_to_death = 17 days

Doubling rate:

It is the time it takes for the number of cases to double. For Coronavirus, it was found to be around 6 days on average. For now, the same rate is assumed for the sake of analysis.

Mortality rate:

It is assumed to be around 1%. REASON Countries that are well prepared had fatality rate of ~0.5% (South Korea) to ~0.9% (rest of China). Countries that were under prepared had fatality rate between ~3%-5%. In other words: Countries that acted fast, wereable to reduce the number of deaths by a factor of ten. Acting fast includes - massive testing, measures taken to enforce Lockdown/Social Distancing/Quarantine to reduce the rate of spread (Flatten the curve)

Days to death:

It is statistically determined to be around 17 days. For now, the same rate is assumed for the sake of analysis.

In [123]:
doubling_rate = 6
Mortality_rate = 0.01
days_to_death = 17
In [124]:
from ipywidgets import widgets
DR_slider = widgets.FloatSlider(
    value=6,
    min=2.0,
    max=8.0,
    step=0.1,
    description='Doubling Rate:',
    readout=True,
    readout_format='.1f',
    continuous_update=True
)

M_slider = widgets.FloatSlider(
    value=1.0,
    min=0.5,
    max=10.0,
    step=0.1,
    description='Mortality Rate %:',
    readout=True,
    readout_format='.1f',
    continuous_update=True
)
Mortality_rate = M_slider.value/100


Death_slider = widgets.FloatSlider(
    value=17,
    min=10.0,
    max=60.0,
    step=0.5,
    description='Days to Death:',
    readout=True,
    readout_format='.1f',
    continuous_update=True
)
days_to_death = Death_slider.value
In [126]:
death_df = covid_india.copy()
death_df_filtered = death_df[death_df.Deaths != 0]

### Finding states with deaths
states = death_df_filtered.States.unique().tolist()
#print("{}\n".format(states))

####Number of days since Jan 30 2020
i = covid_india.iloc[[0]].Date.tolist()[0]
j = death_df.iloc[-1].Date
size = int((j-i).days)+1
In [127]:
actual_total = np.zeros(size, dtype = int) ## Stores total realistic deaths
actual_total_temp = np.zeros(size, dtype = int) ## Stores deaths of individual states and dates
In [128]:
### Loop over each state (The onces with deaths)
count=0
for state in states:
    temp = death_df_filtered[death_df_filtered.States == state]
    #for death in temp
    #deathi = temp.iloc[[0]].Deaths.tolist()[0]
    a = temp.drop_duplicates(subset='Deaths', keep="first").Deaths.tolist()
    b = np.diff(a)
    deaths = np.concatenate(([a[0]], b), axis=0)
In [129]:
### Loop over each state (The onces with deaths)
count=0
for state in states:
    temp = death_df_filtered[death_df_filtered.States == state]
    #for death in temp
    #deathi = temp.iloc[[0]].Deaths.tolist()[0]
    a = temp.drop_duplicates(subset='Deaths', keep="first").Deaths.tolist()
    b = np.diff(a)
    deaths = np.concatenate(([a[0]], b), axis=0)
    #print(deaths)
    
    
    ### Loop over each day for each state
    for i in range(0, len(deaths)):
        ### Go back 17.3 days
        start = temp.iloc[[i]].Date - timedelta(days=int(days_to_death))
        start = int((death_df.iloc[-1].Date - start.tolist()[0]).days)
        #print(start)
        
        ### Calculating the realistic cases for each day and stored in array for each state for each date
        actual_total_temp[size - start-2] = deaths[i]/Mortality_rate
        for i in range(size - start + int(doubling_rate) - 2, size, int(doubling_rate)):
            actual_total_temp[i] = actual_total_temp[i-int(doubling_rate)]*2
        #print("{}\n".format(actual_total_temp))
        
        ### Smoothening the curve for each array formed above
        for i in range(size - start - 2, size-1, int(doubling_rate)):
            smoother = int(actual_total_temp[i]/doubling_rate)
            for j in range(i+1, i+int(doubling_rate)):
                actual_total_temp[j] = actual_total_temp[j-1] + smoother
                if j == size-1:
                    i = size-1
                    break
                    
        
        print("{}\n".format(actual_total_temp))
        ### Adding each state for each day to actual_total (Array)
        actual_total = np.add(actual_total,actual_total_temp)
        actual_total_temp = actual_total_temp*0

#### Finally, actual_total is plotted next

display(DR_slider, M_slider, Death_slider)
[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0  100  116  132
  148  164  180  200  233  266  299  332  365  400  466  532  598  664
  730  800  933 1066 1199 1332 1465 1600 1866 2132 2398 2664 2930 3200
 3733 4266 4799]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0  100  116
  132  148  164  180  200  233  266  299  332  365  400  466  532  598
  664  730  800  933 1066 1199 1332 1465 1600 1866 2132 2398 2664 2930
 3200 3733 4266]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0  100  116
  132  148  164  180  200  233  266  299  332  365  400  466  532  598
  664  730  800  933 1066 1199 1332 1465 1600 1866 2132 2398 2664 2930
 3200 3733 4266]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0  100  116  132  148  164  180  200  233  266  299  332  365  400
  466  532  598  664  730  800  933 1066 1199 1332 1465 1600 1866 2132
 2398 2664 2930]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0  100  116  132  148  164  180  200  233  266  299  332  365
  400  466  532  598  664  730  800  933 1066 1199 1332 1465 1600 1866
 2132 2398 2664]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0  100  116  132  148  164  180  200  233  266  299  332
  365  400  466  532  598  664  730  800  933 1066 1199 1332 1465 1600
 1866 2132 2398]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0  100  116  132  148  164  180  200  233  266  299
  332  365  400  466  532  598  664  730  800  933 1066 1199 1332 1465
 1600 1866 2132]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0  100  116  132  148  164  180  200  233  266
  299  332  365  400  466  532  598  664  730  800  933 1066 1199 1332
 1465 1600 1866]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0  100  116  132  148  164  180  200  233  266  299  332
  365  400  466  532  598  664  730  800  933 1066 1199 1332 1465 1600
 1866 2132 2398]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0  100  116  132  148  164  180  200  233
  266  299  332  365  400  466  532  598  664  730  800  933 1066 1199
 1332 1465 1600]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0  100  116  132  148  164  180  200  233
  266  299  332  365  400  466  532  598  664  730  800  933 1066 1199
 1332 1465 1600]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0  200  233  266  299  332  365  400
  466  532  598  664  730  800  933 1066 1199 1332 1465 1600 1866 2132
 2398 2664 2930]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0  100  116  132  148  164  180
  200  233  266  299  332  365  400  466  532  598  664  730  800  933
 1066 1199 1332]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0  100  116  132  148  164  180
  200  233  266  299  332  365  400  466  532  598  664  730  800  933
 1066 1199 1332]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0  100  116  132  148
  164  180  200  233  266  299  332  365  400  466  532  598  664  730
  800  933 1066]

[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0
    0    0    0    0    0    0    0    0    0    0  100  116  132  148
  164  180  200  233  266  299  332  365  400  466  532  598  664  730
  800  933 1066]

[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0 100 116 132 148 164 180 200 233 266 299 332 365 400 466 532
 598 664 730 800 933]

[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0 100 116 132 148 164 180 200 233 266 299 332 365 400 466 532
 598 664 730 800 933]

In [130]:
covid_india.sort_values(by=['States', 'Total Confirmed cases'], ascending = [True, False],inplace=True)
covid_india.drop_duplicates(subset='States', keep="first",inplace=True)
covid_india.sort_values(by='Total Confirmed cases', ascending = False,inplace=True)
covid_india.index = range(1,covid_india.shape[0]+1)

print(f'Total number of Confirmed COVID 2019 cases across India:', covid_india['Total Confirmed cases'].sum())
print('Estimation of realistic number of cases across India: {} - {} (Around {}-{}x Confirmed cases)'.format(int(actual_total[-1]/2), actual_total[-1], 
                                                                                                       int(actual_total[-1]/2/covid_india['Total Confirmed cases'].sum()), 
                                                                                                       int(actual_total[-1]/covid_india['Total Confirmed cases'].sum())))
print('\n')
print(f'Total number of Active COVID 2019 cases across India:', covid_india['Total Active cases'].sum())
print(f'Total number of Cured/Discharged/Migrated COVID 2019 cases across India:', covid_india['Cured/Discharged/Migrated'].sum())
print(f'Total number of Deaths due to COVID 2019  across India:', covid_india['Deaths'].sum())
print(f'Total number of States/UTs affected:', covid_india['States'].count())


dbd_India = pd.read_excel('per_day_cases.xlsx',sheet_name='India')
fig = go.Figure()
fig.add_trace(go.Scatter(x=dbd_India['Date'], y=dbd_India['Total Cases'],
                    mode='lines+markers',name='Total Cases'))

fig.add_trace(go.Scatter(x=dbd_India['Date'], y=dbd_India['Recovered'], 
                mode='lines',name='Recovered'))
fig.add_trace(go.Scatter(x=dbd_India['Date'], y=dbd_India['Active'], 
                mode='lines',name='Active'))
fig.add_trace(go.Scatter(x=dbd_India['Date'], y=dbd_India['Deaths'], 
                mode='lines',name='Deaths'))
fig.add_trace(go.Scatter(x=dbd_India['Date'], y=actual_total, 
                mode='lines+markers',name='Estimate of real Total cases'))
    
fig.update_layout(title_text='Trend of Coronavirus Cases in India(Confirmed vs Realistic cases)',plot_bgcolor='rgb(250, 242, 242)')

fig.show()

fig = go.Figure()
fig.add_trace(go.Scatter(x=dbd_India['Date'], y=dbd_India['Deaths'], 
                mode='lines',name='Deaths'))
fig.update_layout(title_text='Deaths',plot_bgcolor='rgb(250, 242, 242)')

fig.show()

import plotly.express as px
fig = go.Figure(data=[
    go.Bar(name = "Confirmed cases", x=dbd_India.Date.tolist(), y=dbd_India['New Cases'].tolist()),
    go.Bar(name = "Realistic", x=dbd_India.Date.tolist(), y=np.diff(actual_total))
])
fig.update_layout(barmode='group')
fig.update_layout(title_text='New Coronavirus Cases in India per day',plot_bgcolor='rgb(250, 242, 242)')

fig.show()
Total number of Confirmed COVID 2019 cases across India: 873
Estimation of realistic number of cases across India: 20255 - 40511 (Around 23-46x Confirmed cases)


Total number of Active COVID 2019 cases across India: 778
Total number of Cured/Discharged/Migrated COVID 2019 cases across India: 76
Total number of Deaths due to COVID 2019  across India: 19
Total number of States/UTs affected: 27

Indeterminable variables of the model The model used didnt look perfect, it just gave an idea of how many cases could be actually present.

There were a lot of additional variables to consider. For instance, age distribution in each country would also have an impact: Since mortality was observed much higher for older people, regions with aging population like in Europe would be harder hit on average than younger countries like India. There were more factors viz. weather - environmental, food and lifestyle habits. But it’s still unclear how this would impact transmission and fatality rates.

One more thing to consider was the number of tests per million people and number of hospital beds per 1000 people. In current situation, India ranks among the lowest in both. Many cases and deaths might go unreported especially in rural areas due to lack of testing and self treatment phenomenon that existed in India

Current Scenario

Around 20% of cases require hospitalization, 5% of cases require the Intensive Care Unit (ICU), and around 2.5% require very intensive help, with items such as ventilators or ECMO (extra-corporeal oxygenation).

Best Solution

Social Distancing and Maintaining hygeine, Following guidelines layed down in public interest.

Covid-19 Vs. Italy, S.Korea and India

In [131]:
india = pd.read_excel('per_day_cases.xlsx',sheet_name='India')
italy = pd.read_excel('per_day_cases.xlsx',sheet_name="Italy")
korea = pd.read_excel('per_day_cases.xlsx',sheet_name="Korea")
india = india.fillna(0)
korea = korea.fillna(0)
italy = italy.fillna(0)
In [132]:
import seaborn as sns
plt.figure(figsize=[19,14])
sns.set(style='darkgrid',font_scale=2)

plt.title("Comparative distribution of rise of total cases reported of COVID-19 from Italy, Korea , India")
ax1 = sns.lineplot(x = "Date", y = "Total Cases", data = india,markers = True, dashes = False,label = 'India')
ax1 = sns.lineplot(x = "Date", y = "Total Cases", data = korea,color = 'orange',label = 'korea')
ax1 = sns.lineplot(x = "Date", y = "Total Cases", data = italy,color = 'red',label = 'Italy')
ax1.set(xlabel = 'Date', ylabel = 'Number of Total Cases Reported')
plt.xticks(rotation = 90)
plt.show()

South Korea recorded its first case on 20/01/2020 South Korea's plan of action worked well as they look to have contained the spread effectively.

Italy reported its first positive case on 29/01/2020. Since then, Italy's figures had been pounding.

First case was reported in India on 30/01/2020. India had contained the virus well until start of march, since then figures are rising exponentially. The starting phase is denoted in graph.

In [133]:
plt.figure(figsize=[22,14])
sns.set(style='darkgrid',font_scale=2)

plt.xticks(rotation=90)
plt.title("Comparative distribution of rise of New Cases reported of COVID-19 from Italy, Korea , India")
ax = sns.lineplot(x="Date", y="New Cases", data=india,markers=True, dashes=False,label='India')
ax = sns.lineplot(x="Date", y="New Cases", data=korea,color='orange',label='korea')
ax = sns.lineplot(x="Date", y="New Cases", data=italy,color='red',label='Italy')
ax.set(xlabel='Date',ylabel='Number of New Cases Reported')
plt.show()

In South Korea since March, the rise in new cases was found to be declining. By 3rd week of March, only few new cases were recorded daily. South Korea controlled the havoc effeciently.

On contrast, Italy despite ofits worldclass failed to control the situation. It's inefficiency in managing outbreak of COVID-19 was evident from above graph, where the trend of 'number of new cases registered daily' suggested the rise throughout the period.

In India, the rise in 'number of new cases registered daily' began to increase from 3rd Week of March following which lockdown was implemented in the country.

In [134]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=dbd_India['Date'], y=india['Total Cases'],
                    mode='lines+markers',name='India'))
fig.add_trace(go.Scatter(x=italy['Date'], y=italy['Total Cases'],
                    mode='lines+markers',name='Italy'))
fig.add_trace(go.Scatter(x=korea['Date'], y=korea['Total Cases'],
                    mode='lines+markers',name=' SKorea'))
fig.show()

From previous graph, it was seen that 'daily new cases' count was constantly rising in Italy. Hence, steep rise in total cases can be seen in Italy. By end of March, 69.17k people of Italy were infected. Since, 'daily new cases' count declined very fast in S.Korea, Total cases infected by Covid-19 were rising slowly.

Comparing Rise in Cases in India with Italy and S.Korea before and after surpassing 100 Cases:
In [135]:
india.head()
Out[135]:
Date Total Cases New Cases Active Recovered Deaths Days after surpassing 100 cases
0 2020-01-30 1 1 1 0 0 0.0
1 2020-01-31 1 0 1 0 0 0.0
2 2020-02-01 1 0 1 0 0 0.0
3 2020-02-02 2 1 2 0 0 0.0
4 2020-02-03 3 1 3 0 0 0.0
In [136]:
plt.figure(figsize=[28,28])

plt.subplot(2,2,1)
ax1 = sns.lineplot(x="Date", y="Total Cases", data=india[india['Days after surpassing 100 cases']==0],markers=True, dashes=False,label='Rise in Total cases in India Before surpassing 100 cases Benchmark',color='green')
ax1.set(xlabel='Date',ylabel='Number of Total Cases Reported')
plt.xticks(rotation=90)

plt.subplot(2,2,2)
ax2= sns.lineplot(x="Date", y="Total Cases", data=india[india['Days after surpassing 100 cases']>0],markers=True, dashes=False,label='Rise in Total cases in India After surpassing 100 cases Benchmark',color='red')
ax2.set(xlabel='Date',ylabel='Number of Total Cases Reported')
plt.xticks(rotation=90)
plt.show()

plt.figure(figsize=[28,28])


plt.subplot(2,2,1)
ax1 = sns.lineplot(x="Date", y="Total Cases", data=korea[korea['Days after surpassing 100 cases']==0],markers=True, dashes=False,label='Rise in Total cases in Korea Before surpassing 100 cases Benchmark',color='green')
ax1.set(xlabel='Date',ylabel='Number of Total Cases Reported ')
plt.xticks(rotation=90)

plt.subplot(2,2,2)
ax2= sns.lineplot(x="Date", y="Total Cases", data=korea[korea['Days after surpassing 100 cases']>0],markers=True, dashes=False,label='Rise in Total cases in korea After surpassing 100 cases Benchmark',color='red')
ax2.set(xlabel='Date',ylabel='Number of Total Cases Reported')
plt.xticks(rotation=90)
plt.show()

plt.figure(figsize=[34,28])


plt.subplot(2,2,1)
ax1 = sns.lineplot(x="Date", y="Total Cases", data = italy[italy['Days after surpassing 100 cases']==0],markers=True, dashes=False,label='Rise in Total cases in Korea Before surpassing 100 cases Benchmark',color='green')
ax1.set(xlabel='Date',ylabel='Number of Total Cases Reported ')
plt.xticks(rotation=90)

plt.subplot(2,2,2)
ax2= sns.lineplot(x="Date", y="Total Cases", data=italy[italy['Days after surpassing 100 cases']>0],markers=True, dashes=False,label='Rise in Total cases in korea After surpassing 100 cases Benchmark',color='red')
ax2.set(xlabel='Date',ylabel='Number of Total Cases Reported')
plt.xticks(rotation=90)
plt.show()

Forecasting Month's predictions using ML Algorithms:

In [137]:
#Loading Global Data:
In [138]:
confirmed_df = pd.read_csv('time_series_covid_19_confirmed.csv')
deaths_df = pd.read_csv('time_series_covid_19_deaths.csv')
In [139]:
confirmed_df[confirmed_df['Country/Region']=='India']
Out[139]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20
131 NaN India 21.0 78.0 0 0 0 0 0 0 ... 194 244 330 396 499 536 657 727 887 987

1 rows × 71 columns

In [140]:
cols = confirmed_df.keys()            

Get all the dates for the outbreak

In [141]:
confirmed = confirmed_df.loc[:, cols[4]:cols[-1]]
deaths = deaths_df.loc[:, cols[4]:cols[-1]]
confirmed
Out[141]:
1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 ... 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20
0 0 0 0 0 0 0 0 0 0 0 ... 22 24 24 40 40 74 84 94 110 110
1 0 0 0 0 0 0 0 0 0 0 ... 64 70 76 89 104 123 146 174 186 197
2 0 0 0 0 0 0 0 0 0 0 ... 87 90 139 201 230 264 302 367 409 454
3 0 0 0 0 0 0 0 0 0 0 ... 53 75 88 113 133 164 188 224 267 308
4 0 0 0 0 0 0 0 0 0 0 ... 0 1 2 2 3 3 3 4 4 5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
248 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 8 8
249 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 2
250 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 2
251 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 4
252 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 2

253 rows × 67 columns

In [142]:
dates = confirmed.keys()
world_cases = []
total_deaths = [] 
mortality_rate = []
india_cases = [] 

for i in dates:
    confirmed_sum = confirmed[i].sum()
    death_sum = deaths[i].sum()
    
    
    world_cases.append(confirmed_sum)
    total_deaths.append(death_sum)

    # calculate rates
    mortality_rate.append(death_sum/confirmed_sum)

    india_cases.append(confirmed_df[confirmed_df['Country/Region']=='India'][i].sum())
In [143]:
def daily_increase(data):
    d = [] 
    for i in range(len(data)):
        if i == 0:
            d.append(data[0])
        else:
            d.append(data[i]-data[i-1])
    return d 

world_daily_increase = daily_increase(world_cases)
india_daily_increase = daily_increase(india_cases)
In [144]:
days_since_1_22 = np.array([i for i in range(len(dates))]).reshape(-1, 1)
india_cases = np.array(india_cases).reshape(-1, 1)
total_deaths = np.array(total_deaths).reshape(-1, 1)
Future forcasting
In [145]:
days_in_future = 30
future_forcast = np.array([i for i in range(len(dates)+days_in_future)]).reshape(-1, 1)
adjusted_dates = future_forcast[:-30]
In [146]:
import datetime
start = '1/22/2020'
start_date = datetime.datetime.strptime(start, '%m/%d/%Y')
future_forcast_dates = []
for i in range(len(future_forcast)):
    future_forcast_dates.append((start_date + datetime.timedelta(days=i)).strftime('%m/%d/%Y'))
In [147]:
#Importing Libraries:
In [148]:
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, mean_absolute_error

train_test_split

In [149]:
seed = 1
X_train_confirmed, X_test_confirmed, y_train_confirmed, y_test_confirmed = train_test_split(days_since_1_22, india_cases, test_size=0.28, random_state = seed) 

Building Models for predicting no. of confirmed cases.

SVM(R), Linear Regression(ploynomial), Bayesian Ridgse Regression were considered for the case study
In [150]:
svm_confirmed = SVR(shrinking=True, kernel='poly',gamma=0.01, epsilon=1,degree=6, C=0.1)
svm_confirmed.fit(X_train_confirmed, y_train_confirmed)
svm_pred = svm_confirmed.predict(future_forcast)
In [151]:
# check against testing data
svm_test_pred = svm_confirmed.predict(X_test_confirmed)
plt.plot(svm_test_pred)
plt.plot(y_test_confirmed)
print('MAE:', mean_absolute_error(svm_test_pred, y_test_confirmed))
print('MSE:',mean_squared_error(svm_test_pred, y_test_confirmed))
MAE: 41.3639537598044
MSE: 3248.1318285044244
In [152]:
# transform our data for polynomial regression
poly = PolynomialFeatures(degree=5)
poly_X_train_confirmed = poly.fit_transform(X_train_confirmed)
poly_X_test_confirmed = poly.fit_transform(X_test_confirmed)
poly_future_forcast = poly.fit_transform(future_forcast)
In [153]:
# polynomial regression
linear_model = LinearRegression(normalize=True, fit_intercept=False)
linear_model.fit(poly_X_train_confirmed, y_train_confirmed)
test_linear_pred = linear_model.predict(poly_X_test_confirmed)
linear_pred = linear_model.predict(poly_future_forcast)
print('MAE:', mean_absolute_error(test_linear_pred, y_test_confirmed))
print('MSE:',mean_squared_error(test_linear_pred, y_test_confirmed))
MAE: 12.746551655752087
MSE: 250.62099185191792
In [154]:
print(linear_model.coef_)
[[-1.12307868e+01  8.01779113e+00 -1.15882145e+00  6.32553592e-02
  -1.44570509e-03  1.17956923e-05]]
In [155]:
plt.plot(test_linear_pred)
plt.plot(y_test_confirmed)
Out[155]:
[<matplotlib.lines.Line2D at 0x147de0e2448>]
In [156]:
# bayesian ridge polynomial regression
tol = [1e-4, 1e-3, 1e-2]
alpha_1 = [1e-7, 1e-6, 1e-5, 1e-4]
alpha_2 = [1e-7, 1e-6, 1e-5, 1e-4]
lambda_1 = [1e-7, 1e-6, 1e-5, 1e-4]
lambda_2 = [1e-7, 1e-6, 1e-5, 1e-4]

bayesian_grid = {'tol': tol, 'alpha_1': alpha_1, 'alpha_2' : alpha_2, 'lambda_1': lambda_1, 'lambda_2' : lambda_2}

bayesian = BayesianRidge(fit_intercept=False, normalize=True)
bayesian_search = RandomizedSearchCV(bayesian, bayesian_grid, scoring='neg_mean_squared_error', cv=3, return_train_score=True, n_jobs=-1, n_iter=40, verbose=1)
bayesian_search.fit(poly_X_train_confirmed, y_train_confirmed)
Fitting 3 folds for each of 40 candidates, totalling 120 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:   16.3s finished
Out[156]:
RandomizedSearchCV(cv=3, error_score='raise-deprecating',
                   estimator=BayesianRidge(alpha_1=1e-06, alpha_2=1e-06,
                                           compute_score=False, copy_X=True,
                                           fit_intercept=False, lambda_1=1e-06,
                                           lambda_2=1e-06, n_iter=300,
                                           normalize=True, tol=0.001,
                                           verbose=False),
                   iid='warn', n_iter=40, n_jobs=-1,
                   param_distributions={'alpha_1': [1e-07, 1e-06, 1e-05,
                                                    0.0001],
                                        'alpha_2': [1e-07, 1e-06, 1e-05,
                                                    0.0001],
                                        'lambda_1': [1e-07, 1e-06, 1e-05,
                                                     0.0001],
                                        'lambda_2': [1e-07, 1e-06, 1e-05,
                                                     0.0001],
                                        'tol': [0.0001, 0.001, 0.01]},
                   pre_dispatch='2*n_jobs', random_state=None, refit=True,
                   return_train_score=True, scoring='neg_mean_squared_error',
                   verbose=1)
In [157]:
bayesian_search.best_params_
Out[157]:
{'tol': 0.01,
 'lambda_2': 0.0001,
 'lambda_1': 1e-07,
 'alpha_2': 1e-06,
 'alpha_1': 0.0001}
In [158]:
bayesian_confirmed = bayesian_search.best_estimator_
test_bayesian_pred = bayesian_confirmed.predict(poly_X_test_confirmed)
bayesian_pred = bayesian_confirmed.predict(poly_future_forcast)
print('MAE:', mean_absolute_error(test_bayesian_pred, y_test_confirmed))
print('MSE:',mean_squared_error(test_bayesian_pred, y_test_confirmed))
MAE: 16.83308751576157
MSE: 429.5266951153467
In [159]:
plt.plot(y_test_confirmed)
plt.plot(test_bayesian_pred)
Out[159]:
[<matplotlib.lines.Line2D at 0x14826794848>]

Future predictions using SVM:

In [160]:
# Future predictions using SVM 
print('SVM future predictions:')
set(zip(future_forcast_dates[-30:], np.round(svm_pred[-30:])))
SVM future predictions:
Out[160]:
{('03/29/2020', 892.0),
 ('03/30/2020', 975.0),
 ('03/31/2020', 1065.0),
 ('04/01/2020', 1161.0),
 ('04/02/2020', 1264.0),
 ('04/03/2020', 1374.0),
 ('04/04/2020', 1493.0),
 ('04/05/2020', 1620.0),
 ('04/06/2020', 1756.0),
 ('04/07/2020', 1901.0),
 ('04/08/2020', 2056.0),
 ('04/09/2020', 2222.0),
 ('04/10/2020', 2398.0),
 ('04/11/2020', 2586.0),
 ('04/12/2020', 2786.0),
 ('04/13/2020', 2999.0),
 ('04/14/2020', 3225.0),
 ('04/15/2020', 3465.0),
 ('04/16/2020', 3720.0),
 ('04/17/2020', 3991.0),
 ('04/18/2020', 4278.0),
 ('04/19/2020', 4581.0),
 ('04/20/2020', 4903.0),
 ('04/21/2020', 5243.0),
 ('04/22/2020', 5602.0),
 ('04/23/2020', 5982.0),
 ('04/24/2020', 6382.0),
 ('04/25/2020', 6805.0),
 ('04/26/2020', 7251.0),
 ('04/27/2020', 7722.0)}

Visualizing Graph for Forecasts(SVR):

In [161]:
svm_zip = set(zip(future_forcast_dates[-30:], np.round(svm_pred[-30:])))
# unzipping values 
Date, Conf_cases = zip(*svm_zip) 
Dates = list(Date)
Confirm_cases = list(Conf_cases)
print(Dates)
print(Confirm_cases)
['04/24/2020', '04/14/2020', '04/07/2020', '04/10/2020', '04/05/2020', '04/18/2020', '04/06/2020', '04/23/2020', '04/12/2020', '04/22/2020', '04/03/2020', '03/30/2020', '04/20/2020', '04/13/2020', '04/19/2020', '04/01/2020', '04/21/2020', '04/16/2020', '04/04/2020', '04/11/2020', '04/15/2020', '04/08/2020', '04/02/2020', '04/26/2020', '03/31/2020', '04/27/2020', '04/17/2020', '03/29/2020', '04/25/2020', '04/09/2020']
[6382.0, 3225.0, 1901.0, 2398.0, 1620.0, 4278.0, 1756.0, 5982.0, 2786.0, 5602.0, 1374.0, 975.0, 4903.0, 2999.0, 4581.0, 1161.0, 5243.0, 3720.0, 1493.0, 2586.0, 3465.0, 2056.0, 1264.0, 7251.0, 1065.0, 7722.0, 3991.0, 892.0, 6805.0, 2222.0]
In [162]:
# dictionary of lists  
dict = {'dates': Dates, 'cases': Confirm_cases}

svm_df = pd.DataFrame(dict)
svm_df.head()
Out[162]:
dates cases
0 04/24/2020 6382.0
1 04/14/2020 3225.0
2 04/07/2020 1901.0
3 04/10/2020 2398.0
4 04/05/2020 1620.0
In [163]:
plt.figure(figsize=[28,28])
plt.subplot(2,2,1)
ax1 = sns.lineplot(x="dates", y="cases", data = svm_df, markers=True, dashes=True,label='SVM Forecast',color='red')
ax1.set(xlabel='Date',ylabel='Number of Cases Forecast')
plt.xticks(rotation=90)
Out[163]:
([0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  17,
  18,
  19,
  20,
  21,
  22,
  23,
  24,
  25,
  26,
  27,
  28,
  29],
 <a list of 30 Text xticklabel objects>)

The model predicted the total number of cases in India would reach 7700 by 27th April 2020.

Future predictions using Polynomial Regression

In [164]:
# Future predictions using Polynomial Regression 
linear_pred = linear_pred.reshape(1,-1)[0]
print('Polynomial regression future predictions:')
set(zip(future_forcast_dates[-30:], np.round(linear_pred[-30:])))
Polynomial regression future predictions:
Out[164]:
{('03/29/2020', 1142.0),
 ('03/30/2020', 1304.0),
 ('03/31/2020', 1484.0),
 ('04/01/2020', 1682.0),
 ('04/02/2020', 1901.0),
 ('04/03/2020', 2141.0),
 ('04/04/2020', 2404.0),
 ('04/05/2020', 2692.0),
 ('04/06/2020', 3006.0),
 ('04/07/2020', 3349.0),
 ('04/08/2020', 3721.0),
 ('04/09/2020', 4125.0),
 ('04/10/2020', 4563.0),
 ('04/11/2020', 5037.0),
 ('04/12/2020', 5548.0),
 ('04/13/2020', 6099.0),
 ('04/14/2020', 6693.0),
 ('04/15/2020', 7331.0),
 ('04/16/2020', 8016.0),
 ('04/17/2020', 8750.0),
 ('04/18/2020', 9537.0),
 ('04/19/2020', 10379.0),
 ('04/20/2020', 11277.0),
 ('04/21/2020', 12237.0),
 ('04/22/2020', 13259.0),
 ('04/23/2020', 14348.0),
 ('04/24/2020', 15507.0),
 ('04/25/2020', 16738.0),
 ('04/26/2020', 18045.0),
 ('04/27/2020', 19431.0)}

Visualizing Graph for Polynomial Regression:

In [165]:
poly_zip = set(zip(future_forcast_dates[-30:], np.round(linear_pred[-30:])))
# unzipping values 
Date, Conf_cases = zip(*poly_zip) 
Dates = list(Date)
Confirm_cases = list(Conf_cases)
print(Dates)
print(Confirm_cases)
['04/20/2020', '04/07/2020', '04/06/2020', '04/05/2020', '04/03/2020', '04/02/2020', '04/13/2020', '04/24/2020', '04/15/2020', '04/04/2020', '03/31/2020', '04/10/2020', '04/26/2020', '03/29/2020', '03/30/2020', '04/18/2020', '04/14/2020', '04/09/2020', '04/19/2020', '04/21/2020', '04/22/2020', '04/11/2020', '04/27/2020', '04/01/2020', '04/17/2020', '04/23/2020', '04/16/2020', '04/25/2020', '04/08/2020', '04/12/2020']
[11277.0, 3349.0, 3006.0, 2692.0, 2141.0, 1901.0, 6099.0, 15507.0, 7331.0, 2404.0, 1484.0, 4563.0, 18045.0, 1142.0, 1304.0, 9537.0, 6693.0, 4125.0, 10379.0, 12237.0, 13259.0, 5037.0, 19431.0, 1682.0, 8750.0, 14348.0, 8016.0, 16738.0, 3721.0, 5548.0]
In [166]:
# dictionary of lists  
dict = {'dates': Dates, 'cases': Confirm_cases}

poly_df = pd.DataFrame(dict)
poly_df.head()
Out[166]:
dates cases
0 04/20/2020 11277.0
1 04/07/2020 3349.0
2 04/06/2020 3006.0
3 04/05/2020 2692.0
4 04/03/2020 2141.0
In [167]:
plt.figure(figsize=[28,28])
plt.subplot(2,2,1)
ax1 = sns.lineplot(x="dates", y="cases", data = poly_df, markers=True, dashes=True,label='Ploy_reg Forecast',color='red')
ax1.set(xlabel='Date',ylabel='Number of Cases Forecast')
plt.xticks(rotation=90)
Out[167]:
([0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  17,
  18,
  19,
  20,
  21,
  22,
  23,
  24,
  25,
  26,
  27,
  28,
  29],
 <a list of 30 Text xticklabel objects>)

The model predicted the total number of cases in India would reach 19400 by 27th April 2020.

Future predictions using Bayseian Ridge Regression

In [169]:
# Future predictions using Bayseian Ridge Regression 
print('Ridge regression future predictions:')
set(zip(future_forcast_dates[-30:], np.round(bayesian_pred[-30:])))
Ridge regression future predictions:
Out[169]:
{('03/29/2020', 1123.0),
 ('03/30/2020', 1276.0),
 ('03/31/2020', 1443.0),
 ('04/01/2020', 1627.0),
 ('04/02/2020', 1829.0),
 ('04/03/2020', 2048.0),
 ('04/04/2020', 2288.0),
 ('04/05/2020', 2549.0),
 ('04/06/2020', 2832.0),
 ('04/07/2020', 3139.0),
 ('04/08/2020', 3471.0),
 ('04/09/2020', 3830.0),
 ('04/10/2020', 4217.0),
 ('04/11/2020', 4633.0),
 ('04/12/2020', 5081.0),
 ('04/13/2020', 5561.0),
 ('04/14/2020', 6077.0),
 ('04/15/2020', 6629.0),
 ('04/16/2020', 7219.0),
 ('04/17/2020', 7849.0),
 ('04/18/2020', 8522.0),
 ('04/19/2020', 9239.0),
 ('04/20/2020', 10003.0),
 ('04/21/2020', 10815.0),
 ('04/22/2020', 11677.0),
 ('04/23/2020', 12593.0),
 ('04/24/2020', 13565.0),
 ('04/25/2020', 14594.0),
 ('04/26/2020', 15683.0),
 ('04/27/2020', 16836.0)}

Visualizing the forecast predicted by Bayesian Model:

In [170]:
bay_zip = set(zip(future_forcast_dates[-30:], np.round(bayesian_pred[-30:])))
# unzipping values 
Date, Conf_cases = zip(*bay_zip) 
Dates = list(Date)
Confirm_cases = list(Conf_cases)
print(Dates)
print(Confirm_cases)
['04/06/2020', '04/17/2020', '03/31/2020', '04/26/2020', '04/14/2020', '04/19/2020', '04/20/2020', '04/05/2020', '04/01/2020', '04/27/2020', '04/15/2020', '04/03/2020', '04/22/2020', '04/11/2020', '04/02/2020', '04/16/2020', '04/07/2020', '04/08/2020', '03/29/2020', '03/30/2020', '04/24/2020', '04/10/2020', '04/04/2020', '04/18/2020', '04/25/2020', '04/23/2020', '04/13/2020', '04/09/2020', '04/12/2020', '04/21/2020']
[2832.0, 7849.0, 1443.0, 15683.0, 6077.0, 9239.0, 10003.0, 2549.0, 1627.0, 16836.0, 6629.0, 2048.0, 11677.0, 4633.0, 1829.0, 7219.0, 3139.0, 3471.0, 1123.0, 1276.0, 13565.0, 4217.0, 2288.0, 8522.0, 14594.0, 12593.0, 5561.0, 3830.0, 5081.0, 10815.0]
In [171]:
# dictionary of lists  
dict = {'dates': Dates, 'cases': Confirm_cases}

bay_df = pd.DataFrame(dict)
bay_df.head()
Out[171]:
dates cases
0 04/06/2020 2832.0
1 04/17/2020 7849.0
2 03/31/2020 1443.0
3 04/26/2020 15683.0
4 04/14/2020 6077.0
In [172]:
plt.figure(figsize=[28,28])
plt.subplot(2,2,1)
ax1 = sns.lineplot(x="dates", y="cases", data = bay_df, markers=True, dashes=True,label='Bay_Ridge_reg Forecast',color='red')
ax1.set(xlabel='Date',ylabel='Number of Cases Forecast')
plt.xticks(rotation=90)
Out[172]:
([0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  17,
  18,
  19,
  20,
  21,
  22,
  23,
  24,
  25,
  26,
  27,
  28,
  29],
 <a list of 30 Text xticklabel objects>)

The model predicted the total number of cases in India would reach 19400 by 27th April 2020

In these model predictions, one major factor was not taken into consideration: India's move of Cuntry Lockdown. So, numbers are subjected to change.

Data of First week of April would set the proper tone for future predictions.

Getting information about countries/regions that have confirmed coronavirus cases:

In [173]:
import operator 
latest_confirmed = confirmed_df[dates[-1]]
latest_deaths = deaths_df[dates[-1]]

unique_countries =  list(confirmed_df['Country/Region'].unique())
country_confirmed_cases = []
no_cases = []
for i in unique_countries:
    cases = latest_confirmed[confirmed_df['Country/Region']==i].sum()
    if cases > 0:
        country_confirmed_cases.append(cases)
    else:
        no_cases.append(i)
        
for i in no_cases:
    unique_countries.remove(i)
    
# sort countries by the number of confirmed cases
unique_countries = [k for k, v in sorted(zip(unique_countries, country_confirmed_cases), key=operator.itemgetter(1), reverse=True)]
for i in range(len(unique_countries)):
    country_confirmed_cases[i] = latest_confirmed[confirmed_df['Country/Region']==unique_countries[i]].sum()
# number of cases per country/region
print('Confirmed Cases by Countries/Regions:')
for i in range(len(unique_countries)):
    print(f'{unique_countries[i]}: {country_confirmed_cases[i]} cases') 
Confirmed Cases by Countries/Regions:
US: 121478 cases
Italy: 92472 cases
China: 81999 cases
Spain: 73235 cases
Germany: 57695 cases
France: 38105 cases
Iran: 35408 cases
United Kingdom: 17312 cases
Switzerland: 14076 cases
Netherlands: 9819 cases
Korea, South: 9478 cases
Belgium: 9134 cases
Austria: 8271 cases
Turkey: 7402 cases
Canada: 5576 cases
Portugal: 5170 cases
Norway: 4015 cases
Brazil: 3904 cases
Australia: 3640 cases
Israel: 3619 cases
Sweden: 3447 cases
Czechia: 2631 cases
Ireland: 2415 cases
Denmark: 2366 cases
Malaysia: 2320 cases
Chile: 1909 cases
Luxembourg: 1831 cases
Ecuador: 1823 cases
Japan: 1693 cases
Poland: 1638 cases
Pakistan: 1495 cases
Romania: 1452 cases
Russia: 1264 cases
Thailand: 1245 cases
Saudi Arabia: 1203 cases
South Africa: 1187 cases
Finland: 1167 cases
Indonesia: 1155 cases
Philippines: 1075 cases
Greece: 1061 cases
India: 987 cases
Iceland: 963 cases
Singapore: 802 cases
Panama: 786 cases
Dominican Republic: 719 cases
Mexico: 717 cases
Diamond Princess: 712 cases
Argentina: 690 cases
Slovenia: 684 cases
Peru: 671 cases
Serbia: 659 cases
Croatia: 657 cases
Estonia: 645 cases
Colombia: 608 cases
Qatar: 590 cases
Egypt: 576 cases
Iraq: 506 cases
Bahrain: 476 cases
United Arab Emirates: 468 cases
Algeria: 454 cases
New Zealand: 451 cases
Lebanon: 412 cases
Armenia: 407 cases
Morocco: 402 cases
Lithuania: 394 cases
Ukraine: 356 cases
Hungary: 343 cases
Bulgaria: 331 cases
Andorra: 308 cases
Latvia: 305 cases
Costa Rica: 295 cases
Slovakia: 292 cases
Taiwan*: 283 cases
Tunisia: 278 cases
Uruguay: 274 cases
Bosnia and Herzegovina: 258 cases
Jordan: 246 cases
North Macedonia: 241 cases
Kuwait: 235 cases
Moldova: 231 cases
Kazakhstan: 228 cases
San Marino: 224 cases
Burkina Faso: 207 cases
Albania: 197 cases
Azerbaijan: 182 cases
Cyprus: 179 cases
Vietnam: 174 cases
Oman: 152 cases
Malta: 149 cases
Ghana: 141 cases
Senegal: 130 cases
Brunei: 120 cases
Cuba: 119 cases
Venezuela: 119 cases
Sri Lanka: 113 cases
Afghanistan: 110 cases
Uzbekistan: 104 cases
Mauritius: 102 cases
Cote d'Ivoire: 101 cases
Cambodia: 99 cases
West Bank and Gaza: 98 cases
Honduras: 95 cases
Belarus: 94 cases
Cameroon: 91 cases
Kosovo: 91 cases
Georgia: 90 cases
Nigeria: 89 cases
Montenegro: 84 cases
Bolivia: 74 cases
Trinidad and Tobago: 74 cases
Congo (Kinshasa): 65 cases
Rwanda: 60 cases
Kyrgyzstan: 58 cases
Liechtenstein: 56 cases
Paraguay: 56 cases
Bangladesh: 48 cases
Monaco: 42 cases
Kenya: 38 cases
Guatemala: 34 cases
Jamaica: 30 cases
Uganda: 30 cases
Zambia: 28 cases
Barbados: 26 cases
Madagascar: 26 cases
Togo: 25 cases
El Salvador: 19 cases
Mali: 18 cases
Ethiopia: 16 cases
Maldives: 16 cases
Djibouti: 14 cases
Tanzania: 14 cases
Equatorial Guinea: 12 cases
Mongolia: 12 cases
Dominica: 11 cases
Bahamas: 10 cases
Niger: 10 cases
Eswatini: 9 cases
Guinea: 8 cases
Guyana: 8 cases
Haiti: 8 cases
Namibia: 8 cases
Seychelles: 8 cases
Suriname: 8 cases
Mozambique: 8 cases
Laos: 8 cases
Burma: 8 cases
Antigua and Barbuda: 7 cases
Gabon: 7 cases
Zimbabwe: 7 cases
Grenada: 7 cases
Benin: 6 cases
Eritrea: 6 cases
Holy See: 6 cases
Angola: 5 cases
Cabo Verde: 5 cases
Fiji: 5 cases
Mauritania: 5 cases
Nepal: 5 cases
Sudan: 5 cases
Syria: 5 cases
Congo (Brazzaville): 4 cases
Nicaragua: 4 cases
Bhutan: 3 cases
Central African Republic: 3 cases
Chad: 3 cases
Gambia: 3 cases
Liberia: 3 cases
Saint Lucia: 3 cases
Somalia: 3 cases
Libya: 3 cases
Belize: 2 cases
Guinea-Bissau: 2 cases
Saint Kitts and Nevis: 2 cases
MS Zaandam: 2 cases
Papua New Guinea: 1 cases
Saint Vincent and the Grenadines: 1 cases
Timor-Leste: 1 cases

Conclusion and Recommendations:

Conclusion:

COVID-19 is spreading with astonishing speed; COVID-19 outbreaks in any setting have very serious consequences; and there is now strong evidence that non-pharmaceutical interventions can reduce and even interrupt transmission. Concerningly, global and national preparedness planning is often ambivalent about such interventions. However, to reduce COVID-19 illness and death, near-term readiness planning must embrace the large-scale implementation of high-quality, non-pharmaceutical public health measures. These measures must fully incorporate immediate case detection and isolation, rigorous close contact tracing and monitoring/quarantine, and direct population/community engagement.

Recommendations:
  1. Immediately activate the highest level of national Response Management protocols to ensure the all-of-government and all-of-society approach needed to contain COVID-19 with non-pharmaceutical public health measures;
  2. Prioritize active, exhaustive case finding and immediate testing and isolation, painstaking contact tracing and rigorous quarantine of close contacts;
  3. Fully educate the general public on the seriousness of COVID-19 and their role in preventing its spread;
  4. Immediately expand surveillance to detect COVID-19 transmission chains, by testing all patients with atypical pneumonias, conducting screening in some patients with upper respiratory illnesses and/or recent COVID-19 exposure, and adding testing for the COVID-19 virus to existing surveillance systems (e.g. systems for influenza-like-illness and SARI); and 22
  5. Conduct multi-sector scenario planning and simulations for the deployment of even more stringent measures to interrupt transmission chains as needed (e.g. the suspension of large-scale gatherings and the closure of schools and workplaces).
In [ ]:
 
In [ ]: